Saving and retrieving data in Hadoop HDFS

Before we start
Install what you need
Getting data
Create relational database and import data into it
Start Hadoop
Prepare user account
- Create OS Hadoop group and user
- Create HDFS user home directory and set permissions
Import to HDFS
- Using Sqoop
- Using copyFromLocal command
Getting data back from HDFS

Before we start

Note that in this tutorial hadoop is a root Hadoop user while nosql is a typical, unprivilliged, user. I will write one of the following if the user is important:

nosql@nosql-vm:~$

1	nosql@nosql-vm:~$

hadoop@nosql-vm:~$

1	hadoop@nosql-vm:~$

In other cases I will simply write:

$

Install what you need

For this part you need to have installed:

Java
Hadoop
Sqoop
PostgrSQL

To install all required components, please follow related documentation or you can check how I did it:

The easiest way to instal Java and PostgreSQL is to do this with pacgage manager, like Synapic, or simply with command line (it concerns Linux systems).

Getting data

Fireballs and bolides are astronomical terms for exceptionally bright meteors that are spectacular enough to to be seen over a very wide area (see Fireball and Bolide Data).

You can get data about fireballs using simple exposed API Fireball Data API

Create working directory

nosql@nosql:~/Pulpit/nosql2$ mkdir hadoop_hdfs
nosql@nosql:~/Pulpit/nosql2$ cd hadoop_hdfs/
nosql@nosql:~/Pulpit/nosql2/hadoop_hdfs$

nosql@nosql:~/Pulpit/nosql2$ mkdir hadoop_hdfs

nosql@nosql:~/Pulpit/nosql2$ cd hadoop_hdfs/

nosql@nosql:~/Pulpit/nosql2/hadoop_hdfs$

Either using wget

nosql@nosql:~/Pulpit/nosql2/hadoop_hdfs$ wget https://ssd-api.jpl.nasa.gov/fireball.api?limit=20 -O fireball_data.json

1	nosql@nosql:~/Pulpit/nosql2/hadoop_hdfs$ wget https://ssd-api.jpl.nasa.gov/fireball.api?limit=20 -O fireball_data.json

or with curl

nosql@nosql:~/Pulpit/nosql2/hadoop_hdfs$ curl https://ssd-api.jpl.nasa.gov/fireball.api?limit=20 -o fireball_data.json

1	nosql@nosql:~/Pulpit/nosql2/hadoop_hdfs$ curl https://ssd-api.jpl.nasa.gov/fireball.api?limit=20 -o fireball_data.json

Result

nosql@nosql:~/Pulpit/nosql2/hadoop_hdfs$ cat fireball_data.json 
{"signature":{"source":"NASA/JPL Fireball Data API","version":"1.0"},"count":"20","fields":["date","energy","impact-e","lat","lat-dir","lon","lon-dir","alt","vel"],"data":[["2021-11-17 15:53:21","2.4","0.086","6.8","S","119.1","E","35.0","23.0"],["2021-11-08 05:28:28","3.3","0.11","33.8","S","7.7","W","36",null],["2021-10-28 09:10:30","3.0","0.1","4.1","S","138.7","W","35.2",null],["2021-10-21 10:32:02","3.7","0.13","51.5","N","51.4","E","30","15.9"],["2021-10-20 08:41:50","6.0","0.19","13.8","N","140.4","W","28",null],["2021-10-20 00:43:57","2.0","0.073","59.0","N","154.3","E","31.4","27.5"],["2021-09-29 10:50:59","13.7","0.4","53.9","N","148.0","W","28.0","21.2"],["2021-09-06 17:55:42","3.1","0.11","2.1","S","111.8","W","26.0","13.6"],["2021-07-30 08:06:34","14.6","0.42","7.8","S","90.1","E","63.0",null],["2021-07-29 13:19:57","3.7","0.13","42.4","N","98.4","E","26.4","14.7"],["2021-07-07 13:41:14","3.3","0.11",null,null,null,null,null,null],["2021-07-05 03:46:24","74","1.8","44.3","N","164.2","W","43.4","15.7"],["2021-06-09 05:43:59","2.3","0.082","17.9","S","55.3","W",null,null],["2021-05-16 15:51:08","3.8","0.13","52.1","S","171.2","W","37.0",null],["2021-05-06 05:54:27","2.1","0.076","34.7","S","141.0","E","31.0","26.6"],["2021-05-02 14:12:49","2.5","0.089","12.3","N","43.4","W",null,null],["2021-04-13 02:16:47","2.1","0.076","26.8","N","79.1","W","44.4","14.1"],["2021-04-02 15:52:58","13.7","0.4","71.2","N","106.7","E","40.0",null],["2021-03-06 08:43:06","14.1","0.41","48.6","S","90.4","E","31.1",null],["2021-03-05 13:50:01","3.9","0.13","81.1","S","141.1","E","32.5",null]]}

nosql@nosql:~/Pulpit/nosql2/hadoop_hdfs$ cat fireball_data.json

{"signature":{"source":"NASA/JPL Fireball Data API","version":"1.0"},"count":"20","fields":["date","energy","impact-e","lat","lat-dir","lon","lon-dir","alt","vel"],"data":[["2021-11-17 15:53:21","2.4","0.086","6.8","S","119.1","E","35.0","23.0"],["2021-11-08 05:28:28","3.3","0.11","33.8","S","7.7","W","36",null],["2021-10-28 09:10:30","3.0","0.1","4.1","S","138.7","W","35.2",null],["2021-10-21 10:32:02","3.7","0.13","51.5","N","51.4","E","30","15.9"],["2021-10-20 08:41:50","6.0","0.19","13.8","N","140.4","W","28",null],["2021-10-20 00:43:57","2.0","0.073","59.0","N","154.3","E","31.4","27.5"],["2021-09-29 10:50:59","13.7","0.4","53.9","N","148.0","W","28.0","21.2"],["2021-09-06 17:55:42","3.1","0.11","2.1","S","111.8","W","26.0","13.6"],["2021-07-30 08:06:34","14.6","0.42","7.8","S","90.1","E","63.0",null],["2021-07-29 13:19:57","3.7","0.13","42.4","N","98.4","E","26.4","14.7"],["2021-07-07 13:41:14","3.3","0.11",null,null,null,null,null,null],["2021-07-05 03:46:24","74","1.8","44.3","N","164.2","W","43.4","15.7"],["2021-06-09 05:43:59","2.3","0.082","17.9","S","55.3","W",null,null],["2021-05-16 15:51:08","3.8","0.13","52.1","S","171.2","W","37.0",null],["2021-05-06 05:54:27","2.1","0.076","34.7","S","141.0","E","31.0","26.6"],["2021-05-02 14:12:49","2.5","0.089","12.3","N","43.4","W",null,null],["2021-04-13 02:16:47","2.1","0.076","26.8","N","79.1","W","44.4","14.1"],["2021-04-02 15:52:58","13.7","0.4","71.2","N","106.7","E","40.0",null],["2021-03-06 08:43:06","14.1","0.41","48.6","S","90.4","E","31.1",null],["2021-03-05 13:50:01","3.9","0.13","81.1","S","141.1","E","32.5",null]]}

Now we will use a few command line commands to extract important data from this JSON and save them in CSV file (turn off highlighting and use plain text mode below to see it correctly):

sed -e 's/":/"\n:/g' < fireball_data.json | sed -e '1,5d' -e 's/,"data"//' -e 's/\],\[/\n/g' -e 's/:\[//g' -e 's/\]\]}//g' -e 's/[][]//g' -e 's/"//g'

1	sed -e 's/":/"\n:/g' < fireball_data.json \| sed -e '1,5d' -e 's/,"data"//' -e 's/\],\[/\n/g' -e 's/:\[//g' -e 's/\]\]}//g' -e 's/[][]//g' -e 's/"//g'

Notice that part:

-e 's/[][]//g

1	-e 's/[][]//g

replaced more complex (turn off highlighting and use plain text mode below to see it correctly):

-e 's/\[\|\]//g'

1	-e 's/\[\\|\]//g'

We can put part of this long command into a file json2csv to use multiple times easier:

#!/usr/bin/sed -f
1,5d
s/,"data"//
s/\],\[/\n/g
s/:\[//g
s/\]\]}//g
s/[][]//g
s/"//g'

#!/usr/bin/sed -f

1,5d

s/,"data"//

s/\],\[/\n/g

s/:\[//g

s/\]\]}//g

s/[][]//g

s/"//g'

Set correct permissions so you could execute this file:

nosql@nosql:~/Pulpit/nosql2/hadoop_hdfs$ ls -l
razem 8
-rw-rw-r-- 1 nosql nosql 1609 lis 25 21:47 fireball_data.json
-rw-rw-r-- 1 nosql nosql   82 lis 25 21:53 json2csv
nosql@nosql:~/Pulpit/nosql2/hadoop_hdfs$ chmod 764 json2csv 
nosql@nosql:~/Pulpit/nosql2/hadoop_hdfs$ ls -l
razem 8
-rw-rw-r-- 1 nosql nosql 1609 lis 25 21:47 fireball_data.json
-rwxrw-r-- 1 nosql nosql   82 lis 25 21:53 json2csv

nosql@nosql:~/Pulpit/nosql2/hadoop_hdfs$ ls -l

razem 8

-rw-rw-r-- 1 nosql nosql 1609 lis 25 21:47 fireball_data.json

-rw-rw-r-- 1 nosql nosql 82 lis 25 21:53 json2csv

nosql@nosql:~/Pulpit/nosql2/hadoop_hdfs$ chmod 764 json2csv

nosql@nosql:~/Pulpit/nosql2/hadoop_hdfs$ ls -l

razem 8

-rw-rw-r-- 1 nosql nosql 1609 lis 25 21:47 fireball_data.json

-rwxrw-r-- 1 nosql nosql 82 lis 25 21:53 json2csv

Now we can call it as

$ sed -e 's/":/"\n:/g' < fireball_data.json | ./json2csv
date,energy,impact-e,lat,lat-dir,lon,lon-dir,alt,vel
2021-11-17 15:53:21,2.4,0.086,6.8,S,119.1,E,35.0,23.0
2021-11-08 05:28:28,3.3,0.11,33.8,S,7.7,W,36,null
2021-10-28 09:10:30,3.0,0.1,4.1,S,138.7,W,35.2,null
2021-10-21 10:32:02,3.7,0.13,51.5,N,51.4,E,30,15.9
2021-10-20 08:41:50,6.0,0.19,13.8,N,140.4,W,28,null
2021-10-20 00:43:57,2.0,0.073,59.0,N,154.3,E,31.4,27.5
2021-09-29 10:50:59,13.7,0.4,53.9,N,148.0,W,28.0,21.2
2021-09-06 17:55:42,3.1,0.11,2.1,S,111.8,W,26.0,13.6
2021-07-30 08:06:34,14.6,0.42,7.8,S,90.1,E,63.0,null
2021-07-29 13:19:57,3.7,0.13,42.4,N,98.4,E,26.4,14.7
2021-07-07 13:41:14,3.3,0.11,null,null,null,null,null,null
2021-07-05 03:46:24,74,1.8,44.3,N,164.2,W,43.4,15.7
2021-06-09 05:43:59,2.3,0.082,17.9,S,55.3,W,null,null
2021-05-16 15:51:08,3.8,0.13,52.1,S,171.2,W,37.0,null
2021-05-06 05:54:27,2.1,0.076,34.7,S,141.0,E,31.0,26.6
2021-05-02 14:12:49,2.5,0.089,12.3,N,43.4,W,null,null
2021-04-13 02:16:47,2.1,0.076,26.8,N,79.1,W,44.4,14.1
2021-04-02 15:52:58,13.7,0.4,71.2,N,106.7,E,40.0,null
2021-03-06 08:43:06,14.1,0.41,48.6,S,90.4,E,31.1,null
2021-03-05 13:50:01,3.9,0.13,81.1,S,141.1,E,32.5,null

$ sed -e 's/":/"\n:/g' < fireball_data.json | ./json2csv

date,energy,impact-e,lat,lat-dir,lon,lon-dir,alt,vel

2021-11-17 15:53:21,2.4,0.086,6.8,S,119.1,E,35.0,23.0

2021-11-08 05:28:28,3.3,0.11,33.8,S,7.7,W,36,null

2021-10-28 09:10:30,3.0,0.1,4.1,S,138.7,W,35.2,null

2021-10-21 10:32:02,3.7,0.13,51.5,N,51.4,E,30,15.9

2021-10-20 08:41:50,6.0,0.19,13.8,N,140.4,W,28,null

2021-10-20 00:43:57,2.0,0.073,59.0,N,154.3,E,31.4,27.5

2021-09-29 10:50:59,13.7,0.4,53.9,N,148.0,W,28.0,21.2

2021-09-06 17:55:42,3.1,0.11,2.1,S,111.8,W,26.0,13.6

2021-07-30 08:06:34,14.6,0.42,7.8,S,90.1,E,63.0,null

2021-07-29 13:19:57,3.7,0.13,42.4,N,98.4,E,26.4,14.7

2021-07-07 13:41:14,3.3,0.11,null,null,null,null,null,null

2021-07-05 03:46:24,74,1.8,44.3,N,164.2,W,43.4,15.7

2021-06-09 05:43:59,2.3,0.082,17.9,S,55.3,W,null,null

2021-05-16 15:51:08,3.8,0.13,52.1,S,171.2,W,37.0,null

2021-05-06 05:54:27,2.1,0.076,34.7,S,141.0,E,31.0,26.6

2021-05-02 14:12:49,2.5,0.089,12.3,N,43.4,W,null,null

2021-04-13 02:16:47,2.1,0.076,26.8,N,79.1,W,44.4,14.1

2021-04-02 15:52:58,13.7,0.4,71.2,N,106.7,E,40.0,null

2021-03-06 08:43:06,14.1,0.41,48.6,S,90.4,E,31.1,null

2021-03-05 13:50:01,3.9,0.13,81.1,S,141.1,E,32.5,null

As all tests are positive you can save result to fireball_data.csv file

nosql@nosql:~/Pulpit/nosql2/hadoop_hdfs$ sed -e 's/":/"\n:/g' < fireball_data.json | ./json2csv > fireball_data.csv
nosql@nosql:~/Pulpit/nosql2/hadoop_hdfs$ cat fireball_data.csv 
date,energy,impact-e,lat,lat-dir,lon,lon-dir,alt,vel
2021-11-17 15:53:21,2.4,0.086,6.8,S,119.1,E,35.0,23.0
2021-11-08 05:28:28,3.3,0.11,33.8,S,7.7,W,36,null
2021-10-28 09:10:30,3.0,0.1,4.1,S,138.7,W,35.2,null
2021-10-21 10:32:02,3.7,0.13,51.5,N,51.4,E,30,15.9
2021-10-20 08:41:50,6.0,0.19,13.8,N,140.4,W,28,null
2021-10-20 00:43:57,2.0,0.073,59.0,N,154.3,E,31.4,27.5
2021-09-29 10:50:59,13.7,0.4,53.9,N,148.0,W,28.0,21.2
2021-09-06 17:55:42,3.1,0.11,2.1,S,111.8,W,26.0,13.6
2021-07-30 08:06:34,14.6,0.42,7.8,S,90.1,E,63.0,null
2021-07-29 13:19:57,3.7,0.13,42.4,N,98.4,E,26.4,14.7
2021-07-07 13:41:14,3.3,0.11,null,null,null,null,null,null
2021-07-05 03:46:24,74,1.8,44.3,N,164.2,W,43.4,15.7
2021-06-09 05:43:59,2.3,0.082,17.9,S,55.3,W,null,null
2021-05-16 15:51:08,3.8,0.13,52.1,S,171.2,W,37.0,null
2021-05-06 05:54:27,2.1,0.076,34.7,S,141.0,E,31.0,26.6
2021-05-02 14:12:49,2.5,0.089,12.3,N,43.4,W,null,null
2021-04-13 02:16:47,2.1,0.076,26.8,N,79.1,W,44.4,14.1
2021-04-02 15:52:58,13.7,0.4,71.2,N,106.7,E,40.0,null
2021-03-06 08:43:06,14.1,0.41,48.6,S,90.4,E,31.1,null
2021-03-05 13:50:01,3.9,0.13,81.1,S,141.1,E,32.5,null

nosql@nosql:~/Pulpit/nosql2/hadoop_hdfs$ sed -e 's/":/"\n:/g' < fireball_data.json | ./json2csv > fireball_data.csv

nosql@nosql:~/Pulpit/nosql2/hadoop_hdfs$ cat fireball_data.csv

date,energy,impact-e,lat,lat-dir,lon,lon-dir,alt,vel

2021-11-17 15:53:21,2.4,0.086,6.8,S,119.1,E,35.0,23.0

2021-11-08 05:28:28,3.3,0.11,33.8,S,7.7,W,36,null

2021-10-28 09:10:30,3.0,0.1,4.1,S,138.7,W,35.2,null

2021-10-21 10:32:02,3.7,0.13,51.5,N,51.4,E,30,15.9

2021-10-20 08:41:50,6.0,0.19,13.8,N,140.4,W,28,null

2021-10-20 00:43:57,2.0,0.073,59.0,N,154.3,E,31.4,27.5

2021-09-29 10:50:59,13.7,0.4,53.9,N,148.0,W,28.0,21.2

2021-09-06 17:55:42,3.1,0.11,2.1,S,111.8,W,26.0,13.6

2021-07-30 08:06:34,14.6,0.42,7.8,S,90.1,E,63.0,null

2021-07-29 13:19:57,3.7,0.13,42.4,N,98.4,E,26.4,14.7

2021-07-07 13:41:14,3.3,0.11,null,null,null,null,null,null

2021-07-05 03:46:24,74,1.8,44.3,N,164.2,W,43.4,15.7

2021-06-09 05:43:59,2.3,0.082,17.9,S,55.3,W,null,null

2021-05-16 15:51:08,3.8,0.13,52.1,S,171.2,W,37.0,null

2021-05-06 05:54:27,2.1,0.076,34.7,S,141.0,E,31.0,26.6

2021-05-02 14:12:49,2.5,0.089,12.3,N,43.4,W,null,null

2021-04-13 02:16:47,2.1,0.076,26.8,N,79.1,W,44.4,14.1

2021-04-02 15:52:58,13.7,0.4,71.2,N,106.7,E,40.0,null

2021-03-06 08:43:06,14.1,0.41,48.6,S,90.4,E,31.1,null

2021-03-05 13:50:01,3.9,0.13,81.1,S,141.1,E,32.5,null

Create database and import data into it

best: add another one superuse without touching default postgres [1];
change pasword of default postgres;
add "normal" user.

Please read also the following materials:

I select the first option, and I'm going to add another one superuse without touching default postgres superuser.

nosql@nosql:~/Pulpit/nosql2/hadoop_hdfs$ sudo -i -u postgres psql
[sudo] password for noslq: 
psql (13.5 (Ubuntu 13.5-0ubuntu0.21.10.1))
Type "help" for help.

postgres=# CREATE ROLE pgsuperuser WITH SUPERUSER CREATEDB CREATEROLE LOGIN ENCRYPTED PASSWORD 'pgsuperuserpass';
CREATE ROLE
postgres=# \du
                                    List of roles
  Role name  |                         Attributes                         | Member of 
-------------+------------------------------------------------------------+-----------
 pgsuperuser | Superuser, Create role, Create DB                          | {}
 postgres    | Superuser, Create role, Create DB, Replication, Bypass RLS | {}

postgres=# \q

nosql@nosql:~/Pulpit/nosql2/hadoop_hdfs$ sudo -i -u postgres psql

[sudo] password for noslq:

psql (13.5 (Ubuntu 13.5-0ubuntu0.21.10.1))

Type "help" for help.

postgres=# CREATE ROLE pgsuperuser WITH SUPERUSER CREATEDB CREATEROLE LOGIN ENCRYPTED PASSWORD 'pgsuperuserpass';

CREATE ROLE

postgres=# \du

List of roles

Role name | Attributes | Member of

-------------+------------------------------------------------------------+-----------

pgsuperuser | Superuser, Create role, Create DB | {}

postgres | Superuser, Create role, Create DB, Replication, Bypass RLS | {}

postgres=# \q

Now it is possible to use phpPgAdmin at 127.0.0.1/phppgadmin:

Create database nosql

Create table

Put this code in SQL textarea:

CREATE TYPE lat_dir as enum('N', 'S');
CREATE TYPE lon_dir as enum('E', 'W');

CREATE TABLE fireball (
  id SERIAL PRIMARY KEY,
  date DATE,
  energy REAL,
  impact_e REAL,
  lat REAL,
  lat_dir lat_dir,
  lon REAL,
  lon_dir lon_dir,
  alt REAL,
  vel REAL
);

CREATE TYPE lat_dir as enum('N', 'S');

CREATE TYPE lon_dir as enum('E', 'W');

CREATE TABLE fireball (

id SERIAL PRIMARY KEY,

date DATE,

energy REAL,

impact_e REAL,

lat REAL,

lat_dir lat_dir,

lon REAL,

lon_dir lon_dir,

alt REAL,

vel REAL

);

Import a CSV file into a table using COPY statement

COPY fireball (date, energy, impact_e, lat, lat_dir, lon, lon_dir, alt, vel)
FROM '/home/nosql/Pulpit/nosql2/hadoop_hdfs/fireball_data.csv'
WITH
DELIMITER ','
NULL AS 'null'
CSV HEADER;

COPY fireball (date, energy, impact_e, lat, lat_dir, lon, lon_dir, alt, vel)

FROM '/home/nosql/Pulpit/nosql2/hadoop_hdfs/fireball_data.csv'

WITH

DELIMITER ','

NULL AS 'null'

CSV HEADER;

In case of problem with reading:

Please verify all right to directories and files. For example:

nosql@nosql:~/Pulpit/nosql2$ cd /home/
nosql@nosql:/home$ ls -l
razem 8
drwxr-x---  6 hadoop hadoop 4096 lis 25 14:07 hadoop
drwxr-x--- 17 nosql  nosql  4096 lis 25 21:42 nosql
nosql@nosql:/home$ chmod o+x nosql/
nosql@nosql:/home$ ls -l
razem 8
drwxr-x---  6 hadoop hadoop 4096 lis 25 14:07 hadoop
drwxr-x--x 17 nosql  nosql  4096 lis 25 21:42 nosql

nosql@nosql:~/Pulpit/nosql2$ cd /home/

nosql@nosql:/home$ ls -l

razem 8

drwxr-x--- 6 hadoop hadoop 4096 lis 25 14:07 hadoop

drwxr-x--- 17 nosql nosql 4096 lis 25 21:42 nosql

nosql@nosql:/home$ chmod o+x nosql/

nosql@nosql:/home$ ls -l

razem 8

drwxr-x--- 6 hadoop hadoop 4096 lis 25 14:07 hadoop

drwxr-x--x 17 nosql nosql 4096 lis 25 21:42 nosql

Export data from a table to CSV using COPY statement

COPY fireball (date, energy, impact_e, lat, lat_dir, lon, lon_dir, alt, vel)
TO '/home/nosql/Pulpit/nosql2/hadoop_hdfs/fireball_data_from_postgresql.csv'
WITH
DELIMITER ','
NULL AS 'null'
CSV HEADER;

COPY fireball (date, energy, impact_e, lat, lat_dir, lon, lon_dir, alt, vel)

TO '/home/nosql/Pulpit/nosql2/hadoop_hdfs/fireball_data_from_postgresql.csv'

WITH

DELIMITER ','

NULL AS 'null'

CSV HEADER;

In case of problem with saving

create an empty file manually and set correct permissions:

nosql@nosql:~/Pulpit/nosql2/hadoop_hdfs$ ls -l
razem 12
-rw-rw-r-- 1 nosql nosql 1124 lis 25 22:04 fireball_data.csv
-rw-rw-r-- 1 nosql nosql 1609 lis 25 21:47 fireball_data.json
-rwxrw-r-- 1 nosql nosql   82 lis 25 21:53 json2csv
nosql@nosql:~/Pulpit/nosql2/hadoop_hdfs$ touch fireball_data_from_postgresql.csv
nosql@nosql:~/Pulpit/nosql2/hadoop_hdfs$ ls -l
razem 12
-rw-rw-r-- 1 nosql nosql 1124 lis 25 22:04 fireball_data.csv
-rw-rw-r-- 1 nosql nosql    0 lis 25 23:33 fireball_data_from_postgresql.csv
-rw-rw-r-- 1 nosql nosql 1609 lis 25 21:47 fireball_data.json
-rwxrw-r-- 1 nosql nosql   82 lis 25 21:53 json2csv
nosql@nosql:~/Pulpit/nosql2/hadoop_hdfs$ chmod 666 fireball_data_from_postgresql.csv 
nosql@nosql:~/Pulpit/nosql2/hadoop_hdfs$ ls -l
razem 12
-rw-rw-r-- 1 nosql nosql 1124 lis 25 22:04 fireball_data.csv
-rw-rw-rw- 1 nosql nosql    0 lis 25 23:33 fireball_data_from_postgresql.csv
-rw-rw-r-- 1 nosql nosql 1609 lis 25 21:47 fireball_data.json
-rwxrw-r-- 1 nosql nosql   82 lis 25 21:53 json2csv

nosql@nosql:~/Pulpit/nosql2/hadoop_hdfs$ ls -l

razem 12

-rw-rw-r-- 1 nosql nosql 1124 lis 25 22:04 fireball_data.csv

-rw-rw-r-- 1 nosql nosql 1609 lis 25 21:47 fireball_data.json

-rwxrw-r-- 1 nosql nosql 82 lis 25 21:53 json2csv

nosql@nosql:~/Pulpit/nosql2/hadoop_hdfs$ touch fireball_data_from_postgresql.csv

nosql@nosql:~/Pulpit/nosql2/hadoop_hdfs$ ls -l

razem 12

-rw-rw-r-- 1 nosql nosql 1124 lis 25 22:04 fireball_data.csv

-rw-rw-r-- 1 nosql nosql 0 lis 25 23:33 fireball_data_from_postgresql.csv

-rw-rw-r-- 1 nosql nosql 1609 lis 25 21:47 fireball_data.json

-rwxrw-r-- 1 nosql nosql 82 lis 25 21:53 json2csv

nosql@nosql:~/Pulpit/nosql2/hadoop_hdfs$ chmod 666 fireball_data_from_postgresql.csv

nosql@nosql:~/Pulpit/nosql2/hadoop_hdfs$ ls -l

razem 12

-rw-rw-r-- 1 nosql nosql 1124 lis 25 22:04 fireball_data.csv

-rw-rw-rw- 1 nosql nosql 0 lis 25 23:33 fireball_data_from_postgresql.csv

-rw-rw-r-- 1 nosql nosql 1609 lis 25 21:47 fireball_data.json

-rwxrw-r-- 1 nosql nosql 82 lis 25 21:53 json2csv

Compare exported version (fireball_data_from_postgresql.csv) file:

nosql@nosql:~/Pulpit/nosql2/hadoop_hdfs$ ls -l
razem 16
-rw-rw-r-- 1 nosql nosql 1124 lis 25 22:04 fireball_data.csv
-rw-rw-rw- 1 nosql nosql  916 lis 25 23:35 fireball_data_from_postgresql.csv
-rw-rw-r-- 1 nosql nosql 1609 lis 25 21:47 fireball_data.json
-rwxrw-r-- 1 nosql nosql   82 lis 25 21:53 json2csv
nosql@nosql:~/Pulpit/nosql2/hadoop_hdfs$ cat fireball_data_from_postgresql.csv 
date,energy,impact_e,lat,lat_dir,lon,lon_dir,alt,vel
2021-11-17,2.4,0.086,6.8,S,119.1,E,35,23
2021-11-08,3.3,0.11,33.8,S,7.7,W,36,null
2021-10-28,3,0.1,4.1,S,138.7,W,35.2,null
2021-10-21,3.7,0.13,51.5,N,51.4,E,30,15.9
2021-10-20,6,0.19,13.8,N,140.4,W,28,null
2021-10-20,2,0.073,59,N,154.3,E,31.4,27.5
2021-09-29,13.7,0.4,53.9,N,148,W,28,21.2
2021-09-06,3.1,0.11,2.1,S,111.8,W,26,13.6
2021-07-30,14.6,0.42,7.8,S,90.1,E,63,null
2021-07-29,3.7,0.13,42.4,N,98.4,E,26.4,14.7
2021-07-07,3.3,0.11,null,null,null,null,null,null
2021-07-05,74,1.8,44.3,N,164.2,W,43.4,15.7
2021-06-09,2.3,0.082,17.9,S,55.3,W,null,null
2021-05-16,3.8,0.13,52.1,S,171.2,W,37,null
2021-05-06,2.1,0.076,34.7,S,141,E,31,26.6
2021-05-02,2.5,0.089,12.3,N,43.4,W,null,null
2021-04-13,2.1,0.076,26.8,N,79.1,W,44.4,14.1
2021-04-02,13.7,0.4,71.2,N,106.7,E,40,null
2021-03-06,14.1,0.41,48.6,S,90.4,E,31.1,null
2021-03-05,3.9,0.13,81.1,S,141.1,E,32.5,null

nosql@nosql:~/Pulpit/nosql2/hadoop_hdfs$ ls -l

razem 16

-rw-rw-r-- 1 nosql nosql 1124 lis 25 22:04 fireball_data.csv

-rw-rw-rw- 1 nosql nosql 916 lis 25 23:35 fireball_data_from_postgresql.csv

-rw-rw-r-- 1 nosql nosql 1609 lis 25 21:47 fireball_data.json

-rwxrw-r-- 1 nosql nosql 82 lis 25 21:53 json2csv

nosql@nosql:~/Pulpit/nosql2/hadoop_hdfs$ cat fireball_data_from_postgresql.csv

date,energy,impact_e,lat,lat_dir,lon,lon_dir,alt,vel

2021-11-17,2.4,0.086,6.8,S,119.1,E,35,23

2021-11-08,3.3,0.11,33.8,S,7.7,W,36,null

2021-10-28,3,0.1,4.1,S,138.7,W,35.2,null

2021-10-21,3.7,0.13,51.5,N,51.4,E,30,15.9

2021-10-20,6,0.19,13.8,N,140.4,W,28,null

2021-10-20,2,0.073,59,N,154.3,E,31.4,27.5

2021-09-29,13.7,0.4,53.9,N,148,W,28,21.2

2021-09-06,3.1,0.11,2.1,S,111.8,W,26,13.6

2021-07-30,14.6,0.42,7.8,S,90.1,E,63,null

2021-07-29,3.7,0.13,42.4,N,98.4,E,26.4,14.7

2021-07-07,3.3,0.11,null,null,null,null,null,null

2021-07-05,74,1.8,44.3,N,164.2,W,43.4,15.7

2021-06-09,2.3,0.082,17.9,S,55.3,W,null,null

2021-05-16,3.8,0.13,52.1,S,171.2,W,37,null

2021-05-06,2.1,0.076,34.7,S,141,E,31,26.6

2021-05-02,2.5,0.089,12.3,N,43.4,W,null,null

2021-04-13,2.1,0.076,26.8,N,79.1,W,44.4,14.1

2021-04-02,13.7,0.4,71.2,N,106.7,E,40,null

2021-03-06,14.1,0.41,48.6,S,90.4,E,31.1,null

2021-03-05,3.9,0.13,81.1,S,141.1,E,32.5,null

with original (fireball_data.csv) file:

nosql@nosql:~/Pulpit/nosql2/hadoop_hdfs$ cat fireball_data.csv 
date,energy,impact-e,lat,lat-dir,lon,lon-dir,alt,vel
2021-11-17 15:53:21,2.4,0.086,6.8,S,119.1,E,35.0,23.0
2021-11-08 05:28:28,3.3,0.11,33.8,S,7.7,W,36,null
2021-10-28 09:10:30,3.0,0.1,4.1,S,138.7,W,35.2,null
2021-10-21 10:32:02,3.7,0.13,51.5,N,51.4,E,30,15.9
2021-10-20 08:41:50,6.0,0.19,13.8,N,140.4,W,28,null
2021-10-20 00:43:57,2.0,0.073,59.0,N,154.3,E,31.4,27.5
2021-09-29 10:50:59,13.7,0.4,53.9,N,148.0,W,28.0,21.2
2021-09-06 17:55:42,3.1,0.11,2.1,S,111.8,W,26.0,13.6
2021-07-30 08:06:34,14.6,0.42,7.8,S,90.1,E,63.0,null
2021-07-29 13:19:57,3.7,0.13,42.4,N,98.4,E,26.4,14.7
2021-07-07 13:41:14,3.3,0.11,null,null,null,null,null,null
2021-07-05 03:46:24,74,1.8,44.3,N,164.2,W,43.4,15.7
2021-06-09 05:43:59,2.3,0.082,17.9,S,55.3,W,null,null
2021-05-16 15:51:08,3.8,0.13,52.1,S,171.2,W,37.0,null
2021-05-06 05:54:27,2.1,0.076,34.7,S,141.0,E,31.0,26.6
2021-05-02 14:12:49,2.5,0.089,12.3,N,43.4,W,null,null
2021-04-13 02:16:47,2.1,0.076,26.8,N,79.1,W,44.4,14.1
2021-04-02 15:52:58,13.7,0.4,71.2,N,106.7,E,40.0,null
2021-03-06 08:43:06,14.1,0.41,48.6,S,90.4,E,31.1,null
2021-03-05 13:50:01,3.9,0.13,81.1,S,141.1,E,32.5,null
nosql@nosql:~/Pulpit/nosql2/hadoop_hdfs$

nosql@nosql:~/Pulpit/nosql2/hadoop_hdfs$ cat fireball_data.csv

date,energy,impact-e,lat,lat-dir,lon,lon-dir,alt,vel

2021-11-17 15:53:21,2.4,0.086,6.8,S,119.1,E,35.0,23.0

2021-11-08 05:28:28,3.3,0.11,33.8,S,7.7,W,36,null

2021-10-28 09:10:30,3.0,0.1,4.1,S,138.7,W,35.2,null

2021-10-21 10:32:02,3.7,0.13,51.5,N,51.4,E,30,15.9

2021-10-20 08:41:50,6.0,0.19,13.8,N,140.4,W,28,null

2021-10-20 00:43:57,2.0,0.073,59.0,N,154.3,E,31.4,27.5

2021-09-29 10:50:59,13.7,0.4,53.9,N,148.0,W,28.0,21.2

2021-09-06 17:55:42,3.1,0.11,2.1,S,111.8,W,26.0,13.6

2021-07-30 08:06:34,14.6,0.42,7.8,S,90.1,E,63.0,null

2021-07-29 13:19:57,3.7,0.13,42.4,N,98.4,E,26.4,14.7

2021-07-07 13:41:14,3.3,0.11,null,null,null,null,null,null

2021-07-05 03:46:24,74,1.8,44.3,N,164.2,W,43.4,15.7

2021-06-09 05:43:59,2.3,0.082,17.9,S,55.3,W,null,null

2021-05-16 15:51:08,3.8,0.13,52.1,S,171.2,W,37.0,null

2021-05-06 05:54:27,2.1,0.076,34.7,S,141.0,E,31.0,26.6

2021-05-02 14:12:49,2.5,0.089,12.3,N,43.4,W,null,null

2021-04-13 02:16:47,2.1,0.076,26.8,N,79.1,W,44.4,14.1

2021-04-02 15:52:58,13.7,0.4,71.2,N,106.7,E,40.0,null

2021-03-06 08:43:06,14.1,0.41,48.6,S,90.4,E,31.1,null

2021-03-05 13:50:01,3.9,0.13,81.1,S,141.1,E,32.5,null

nosql@nosql:~/Pulpit/nosql2/hadoop_hdfs$

You can also try to use command line for the above following this pattern (this is only an example - you have to adjust it to our example)

cat output.json | psql -h localhost -p 5432 postgres -U postgres -c "COPY temp (data) FROM STDIN;"

1	cat output.json \| psql -h localhost -p 5432 postgres -U postgres -c "COPY temp (data) FROM STDIN;"

Start Hadoop

If it's not running yet, start Hadoop. Do this as a Hadoop superuser (hadoop in my case):

nosql@nosql:~/Pulpit/nosql2/hadoop_hdfs$ su hadoop
Hasło: 
hadoop@nosql:/home/nosql/Pulpit/nosql2/hadoop_hdfs$ start-dfs.sh
Starting namenodes on [localhost]
Starting datanodes
Starting secondary namenodes [nosql]
hadoop@nosql:/home/nosql/Pulpit/nosql2/hadoop_hdfs$ start-yarn.sh
Starting resourcemanager
Starting nodemanagers

nosql@nosql:~/Pulpit/nosql2/hadoop_hdfs$ su hadoop

Hasło:

hadoop@nosql:/home/nosql/Pulpit/nosql2/hadoop_hdfs$ start-dfs.sh

Starting namenodes on [localhost]

Starting datanodes

Starting secondary namenodes [nosql]

hadoop@nosql:/home/nosql/Pulpit/nosql2/hadoop_hdfs$ start-yarn.sh

Starting resourcemanager

Starting nodemanagers

Prepare user account

Now we have to be sure that all necessary HDFS and Hadoop accounts for user exists. In order to enable new user to use your Hadoop cluster, follow these general steps.

Create OS Hadoop group and user

Create the group

$ sudo groupadd hadoopuser

1

$ sudo groupadd hadoopuser
If user doesn't exist

Create an OS account on the Linux system from which you want to let a user execute Hadoop jobs.

$ sudo useradd –g hadoopuser newuser $ sudo passwd newuser

1
2

$ sudo useradd –g hadoopuser newuser
$ sudo passwd newuser

Note:
- -g The group name or number of the user's initial login group.
- -G A list of supplementary groups which the user is also a member of.
According to my test option -g should be used to pass Hadoop user verification process.

If user exists

Create an OS account on the Linux system from which you want to let a user execute Hadoop jobs.

nosql@nosql:~$ groups
nosql adm cdrom sudo dip plugdev lpadmin lxd sambashare
nosql@nosql:~$ id
uid=1000(nosql) gid=1000(nosql) grupy=1000(nosql),4(adm),24(cdrom),27(sudo),30(dip),46(plugdev),123(lpadmin),132(lxd),133(sambashare)
nosql@nosql:~$ sudo usermod -g hadoopuser nosql

nosql@nosql:~$ groups

nosql adm cdrom sudo dip plugdev lpadmin lxd sambashare

nosql@nosql:~$ id

uid=1000(nosql) gid=1000(nosql) grupy=1000(nosql),4(adm),24(cdrom),27(sudo),30(dip),46(plugdev),123(lpadmin),132(lxd),133(sambashare)

nosql@nosql:~$ sudo usermod -g hadoopuser nosql

Note:

-a append group to existing user's groups. Without this new group will overwrite all existing groups when -G is used.

To make new gropu membership active you have to relogin (logout and then login):

nosql@nosql:~$ groups
hadoopuser adm cdrom sudo dip plugdev lpadmin lxd sambashare
nosql@nosql:~$ id
uid=1000(nosql) gid=1002(hadoopuser) grupy=1002(hadoopuser),4(adm),24(cdrom),27(sudo),30(dip),46(plugdev),123(lpadmin),132(lxd),133(sambashare)

nosql@nosql:~$ groups

hadoopuser adm cdrom sudo dip plugdev lpadmin lxd sambashare

nosql@nosql:~$ id

uid=1000(nosql) gid=1002(hadoopuser) grupy=1002(hadoopuser),4(adm),24(cdrom),27(sudo),30(dip),46(plugdev),123(lpadmin),132(lxd),133(sambashare)

Create HDFS user home directory and set permissions

In order to create a new HDFS user, you need to create a directory under the /user directory. This directory will serve as the HDFS home directory for the user.

hadoop@nosql:/home/nosql$ hdfs dfs -ls /
Found 1 items
drwxr-xr-x   - hadoop supergroup          0 2021-11-25 14:34 /test
hadoop@nosql:/home/nosql$ hdfs dfs -mkdir /user
hadoop@nosql:/home/nosql$ hdfs dfs -mkdir /user/nosql
hadoop@nosql:/home/nosql$ hdfs dfs -ls /user
Found 1 items
drwxr-xr-x   - hadoop supergroup          0 2021-11-25 23:58 /user/nosql

hadoop@nosql:/home/nosql$ hdfs dfs -ls /

Found 1 items

drwxr-xr-x - hadoop supergroup 0 2021-11-25 14:34 /test

hadoop@nosql:/home/nosql$ hdfs dfs -mkdir /user

hadoop@nosql:/home/nosql$ hdfs dfs -mkdir /user/nosql

hadoop@nosql:/home/nosql$ hdfs dfs -ls /user

Found 1 items

drwxr-xr-x - hadoop supergroup 0 2021-11-25 23:58 /user/nosql

Change the ownership of the directory, since you don’t want to use the default owner/group (hadoop/supergroup) for this directory.

hadoop@nosql:/home/nosql$ hdfs dfs -ls /user
Found 1 items
drwxr-xr-x   - hadoop supergroup          0 2021-11-25 23:58 /user/nosql
hadoop@nosql:/home/nosql$ hdfs dfs -chown nosql:hadoopuser /user/nosql
hadoop@nosql:/home/nosql$ hdfs dfs -ls /user
Found 1 items
drwxr-xr-x   - nosql hadoopuser          0 2021-11-25 23:58 /user/nosql

hadoop@nosql:/home/nosql$ hdfs dfs -ls /user

Found 1 items

drwxr-xr-x - hadoop supergroup 0 2021-11-25 23:58 /user/nosql

hadoop@nosql:/home/nosql$ hdfs dfs -chown nosql:hadoopuser /user/nosql

hadoop@nosql:/home/nosql$ hdfs dfs -ls /user

Found 1 items

drwxr-xr-x - nosql hadoopuser 0 2021-11-25 23:58 /user/nosql

User nosql can now store the output of his/her MapReduce and other jobs under that user’s home directory in HDFS.

Refresh the user and group mappings to let the NameNode know about the new user:

hadoop@nosql:/home/nosql$ hdfs dfsadmin -refreshUserToGroupsMappings Refresh user to groups mapping successful

1
2

hadoop@nosql:/home/nosql$ hdfs dfsadmin -refreshUserToGroupsMappings
Refresh user to groups mapping successful

Make sure that all the permissions on the Hadoop temp directory (which is specified in the core-site.xml file) are so all Hadoop users can access. Default temp directory is defined as below:

<property>
  <name>hadoop.tmp.dir</name>
  <value>/tmp/hadoop-$(user.name)</value>
</property>

<name>hadoop.tmp.dir</name>

<value>/tmp/hadoop-$(user.name)</value>

</property>

Check existing ownership

hadoop@nosql:/home/nosql$ hdfs dfs -ls /
Found 2 items
drwxr-xr-x   - hadoop supergroup          0 2021-11-25 14:34 /test
drwxr-xr-x   - hadoop supergroup          0 2021-11-25 23:58 /user
hadoop@nosql:/home/nosql$ hdfs dfs -mkdir /tmp
hadoop@nosql:/home/nosql$ hdfs dfs -ls /
Found 3 items
drwxr-xr-x   - hadoop supergroup          0 2021-11-25 14:34 /test
drwxr-xr-x   - hadoop supergroup          0 2021-11-26 00:07 /tmp
drwxr-xr-x   - hadoop supergroup          0 2021-11-25 23:58 /user

hadoop@nosql:/home/nosql$ hdfs dfs -ls /

Found 2 items

drwxr-xr-x - hadoop supergroup 0 2021-11-25 14:34 /test

drwxr-xr-x - hadoop supergroup 0 2021-11-25 23:58 /user

hadoop@nosql:/home/nosql$ hdfs dfs -mkdir /tmp

hadoop@nosql:/home/nosql$ hdfs dfs -ls /

Found 3 items

drwxr-xr-x - hadoop supergroup 0 2021-11-25 14:34 /test

drwxr-xr-x - hadoop supergroup 0 2021-11-26 00:07 /tmp

drwxr-xr-x - hadoop supergroup 0 2021-11-25 23:58 /user

Create temp directory for nosql user

hadoop@nosql:/home/nosql$ hdfs dfs -mkdir /tmp/hadoop-nosql
hadoop@nosql:/home/nosql$ hdfs dfs -ls /tmp
Found 1 items
drwxr-xr-x   - hadoop supergroup          0 2021-11-26 00:08 /tmp/hadoop-nosql

hadoop@nosql:/home/nosql$ hdfs dfs -mkdir /tmp/hadoop-nosql

hadoop@nosql:/home/nosql$ hdfs dfs -ls /tmp

Found 1 items

drwxr-xr-x - hadoop supergroup 0 2021-11-26 00:08 /tmp/hadoop-nosql

Change ownership of newly created directory (owner to nosql and group to hadoopuser)

hadoop@nosql:/home/nosql$ hdfs dfs -chown nosql /tmp/hadoop-nosql
hadoop@nosql:/home/nosql$ hdfs dfs -ls /tmp
Found 1 items
drwxr-xr-x   - nosql supergroup          0 2021-11-26 00:08 /tmp/hadoop-nosql
hadoop@nosql:/home/nosql$ hdfs dfs -chgrp -R hadoopuser /tmp
hadoop@nosql:/home/nosql$ hdfs dfs -ls /tmp
Found 1 items
drwxr-xr-x   - nosql hadoopuser          0 2021-11-26 00:08 /tmp/hadoop-nosql

hadoop@nosql:/home/nosql$ hdfs dfs -chown nosql /tmp/hadoop-nosql

hadoop@nosql:/home/nosql$ hdfs dfs -ls /tmp

Found 1 items

drwxr-xr-x - nosql supergroup 0 2021-11-26 00:08 /tmp/hadoop-nosql

hadoop@nosql:/home/nosql$ hdfs dfs -chgrp -R hadoopuser /tmp

hadoop@nosql:/home/nosql$ hdfs dfs -ls /tmp

Found 1 items

drwxr-xr-x - nosql hadoopuser 0 2021-11-26 00:08 /tmp/hadoop-nosql

Change right of tmp directory

hadoop@nosql:/home/nosql$ hdfs dfs -ls /
Found 3 items
drwxr-xr-x   - hadoop supergroup          0 2021-11-25 14:34 /test
drwxr-xr-x   - hadoop hadoopuser          0 2021-11-26 00:08 /tmp
drwxr-xr-x   - hadoop supergroup          0 2021-11-25 23:58 /user
hadoop@nosql:/home/nosql$ hdfs dfs -chmod -R 770 /tmp
hadoop@nosql:/home/nosql$ hdfs dfs -ls /
Found 3 items
drwxr-xr-x   - hadoop supergroup          0 2021-11-25 14:34 /test
drwxrwx---   - hadoop hadoopuser          0 2021-11-26 00:08 /tmp
drwxr-xr-x   - hadoop supergroup          0 2021-11-25 23:58 /user

hadoop@nosql:/home/nosql$ hdfs dfs -ls /

Found 3 items

drwxr-xr-x - hadoop supergroup 0 2021-11-25 14:34 /test

drwxr-xr-x - hadoop hadoopuser 0 2021-11-26 00:08 /tmp

drwxr-xr-x - hadoop supergroup 0 2021-11-25 23:58 /user

hadoop@nosql:/home/nosql$ hdfs dfs -chmod -R 770 /tmp

hadoop@nosql:/home/nosql$ hdfs dfs -ls /

Found 3 items

drwxr-xr-x - hadoop supergroup 0 2021-11-25 14:34 /test

drwxrwx--- - hadoop hadoopuser 0 2021-11-26 00:08 /tmp

drwxr-xr-x - hadoop supergroup 0 2021-11-25 23:58 /user

The new user can now log into the gateway servers and execute his or her Hadoop jobs and store data in HDFS.

Import to HDFS

Using Sqoop

Put JDBC connector in correct dir (/usr/lib/sqoop/lib in my case; connector postgresql-42.2.18.jar).

Checking version

nosql@nosql:~$ sqoop version
Warning: /usr/lib/sqoop/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /usr/lib/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /usr/lib/sqoop/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
2021-11-26 00:13:44,680 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
Sqoop 1.4.7
git commit id 2328971411f57f0cb683dfb79d19d4d19d185dd8
Compiled by maugli on Thu Dec 21 15:59:58 STD 2017

nosql@nosql:~$ sqoop version

Warning: /usr/lib/sqoop/../hbase does not exist! HBase imports will fail.

Please set $HBASE_HOME to the root of your HBase installation.

Warning: /usr/lib/sqoop/../hcatalog does not exist! HCatalog jobs will fail.

Please set $HCAT_HOME to the root of your HCatalog installation.

Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.

Please set $ACCUMULO_HOME to the root of your Accumulo installation.

Warning: /usr/lib/sqoop/../zookeeper does not exist! Accumulo imports will fail.

Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.

2021-11-26 00:13:44,680 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7

Sqoop 1.4.7

git commit id 2328971411f57f0cb683dfb79d19d4d19d185dd8

Compiled by maugli on Thu Dec 21 15:59:58 STD 2017

First try and first problem. Solving java.lang.NoClassDefFoundError: org/apache/commons/lang/StringUtils problem

nosql@nosql:~$ sqoop import -connect 'jdbc:postgresql://127.0.0.1:5432/nosql' --username 'pgsuperuser' --password 'pgsuperuserpass' --table 'fireball' --target-dir 'fireball'
Warning: /usr/lib/sqoop/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /usr/lib/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /usr/lib/sqoop/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
2021-11-26 00:15:35,105 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
2021-11-26 00:15:35,171 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/commons/lang/StringUtils
	at org.apache.sqoop.tool.BaseSqoopTool.validateHiveOptions(BaseSqoopTool.java:1583)
	at org.apache.sqoop.tool.ImportTool.validateOptions(ImportTool.java:1178)
	at org.apache.sqoop.Sqoop.run(Sqoop.java:137)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
	at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
	at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
	at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
	at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
Caused by: java.lang.ClassNotFoundException: org.apache.commons.lang.StringUtils
	at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581)
	at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
	at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
	... 8 more

nosql@nosql:~$ sqoop import -connect 'jdbc:postgresql://127.0.0.1:5432/nosql' --username 'pgsuperuser' --password 'pgsuperuserpass' --table 'fireball' --target-dir 'fireball'