- Download, unpack and move
- Setting up the environment variable
- Configure Sqoop with Hadoop
- Get JDBC required for your database
- Verifying Sqoop
Apache Sqoop(TM) is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases. Unfortunately, this project has retired and was moved into the Attic in 2021-06.
Last update on Apache Sqoop page was in 2019-01-18.
Download, unpack and move
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
nosql@nosql:~/Pulpit/nosql2$ ls -l razem 17540 drwxr-xr-x 10 nosql nosql 4096 lip 21 15:44 apache-tinkerpop-gremlin-console-3.5.1 -rw-rw-r-- 1 nosql nosql 17953604 lis 25 19:31 sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz nosql@nosql:~/Pulpit/nosql2$ tar -zxvf sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz [... CUT ...] nosql@nosql:~/Pulpit/nosql2$ rm sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz nosql@nosql:~/Pulpit/nosql2$ ls -l razem 8 drwxr-xr-x 10 nosql nosql 4096 lip 21 15:44 apache-tinkerpop-gremlin-console-3.5.1 drwxr-xr-x 9 nosql nosql 4096 gru 19 2017 sqoop-1.4.7.bin__hadoop-2.6.0 nosql@nosql:~/Pulpit/nosql2$ sudo mv sqoop-1.4.7.bin__hadoop-2.6.0 /usr/lib/sqoop [sudo] hasło użytkownika nosql: nosql@nosql:~/Pulpit/nosql2$ ls -l /usr/lib | grep sqoop drwxr-xr-x 9 nosql nosql 4096 gru 19 2017 sqoop |
Setting up the environment variable
Run text editor
1 |
nosql@nosql:~/Pulpit/nosql2$ nano ~/.bashrc |
and paste
1 2 3 |
#Sqoop export SQOOP_HOME=/usr/lib/sqoop export PATH=$PATH:$SQOOP_HOME/bin |
Activate the environment variables with the following command:
1 |
nosql@nosql:~/Pulpit/nosql2$ source ~/.bashrc |
Configure Sqoop with Hadoop
Move to Sqoop config directory and copy the template file using the following command:
1 2 3 4 5 6 7 8 9 |
nosql@nosql:~/Pulpit/nosql2$ cd $SQOOP_HOME/conf/ nosql@nosql:/usr/lib/sqoop/conf$ ls -l razem 28 -rw-rw-r-- 1 nosql nosql 3895 gru 19 2017 oraoop-site-template.xml -rw-rw-r-- 1 nosql nosql 1404 gru 19 2017 sqoop-env-template.cmd -rwxr-xr-x 1 nosql nosql 1345 gru 19 2017 sqoop-env-template.sh -rw-rw-r-- 1 nosql nosql 6044 gru 19 2017 sqoop-site-template.xml -rw-rw-r-- 1 nosql nosql 6044 gru 19 2017 sqoop-site.xml nosql@nosql:/usr/lib/sqoop/conf$ cp sqoop-env-template.sh sqoop-env.sh |
Open sqoop-env.sh
:
1 |
nosql@nosql:/usr/lib/sqoop/conf$ nano sqoop-env.sh |
and edit the following lines:
1 2 3 4 5 |
#Set path to where bin/hadoop is available export HADOOP_COMMON_HOME=/usr/local/hadoop #Set path to where hadoop-*-core.jar is available export HADOOP_MAPRED_HOME=/usr/local/hadoop |
Get JDBC required for your database
Download JDBC required for your database - in my case it is PostgreSQL driver, and move
jar
file to a correct location:
1 2 3 4 5 |
nosql@nosql:~/Pulpit/nosql2$ ls -l razem 996 drwxr-xr-x 10 nosql nosql 4096 lip 21 15:44 apache-tinkerpop-gremlin-console-3.5.1 -rw-rw-r-- 1 nosql nosql 1015689 lis 25 19:28 postgresql-42.3.1.jar nosql@nosql:~/Pulpit/nosql2$ mv postgresql-42.3.1.jar /usr/lib/sqoop/lib |
Verifying Sqoop
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
nosql@nosql:~/Pulpit/nosql2$ cd $SQOOP_HOME/bin nosql@nosql:/usr/lib/sqoop/bin$ sqoop-version Warning: /usr/lib/sqoop/../hbase does not exist! HBase imports will fail. Please set $HBASE_HOME to the root of your HBase installation. Warning: /usr/lib/sqoop/../hcatalog does not exist! HCatalog jobs will fail. Please set $HCAT_HOME to the root of your HCatalog installation. Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. Warning: /usr/lib/sqoop/../zookeeper does not exist! Accumulo imports will fail. Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation. 2021-11-25 20:25:40,608 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7 Sqoop 1.4.7 git commit id 2328971411f57f0cb683dfb79d19d4d19d185dd8 Compiled by maugli on Thu Dec 21 15:59:58 STD 2017 |