Tuesday, 30 July 2019

Sqoop Jobs

For help on sqoop job type : sqoop help job
Argument
Description
--create <job-id>
Define a new saved job with the specified job-id (name). A second Sqoop command-line, separated by a -- should be specified; this defines the saved job.
--delete <job-id>
Delete a saved job.
--exec <job-id>
Given a job defined with --create, run the saved job.
--show <job-id>
Show the parameters for a saved job.
--list
List all saved jobs
--meta-connect <jdbc-uri>
Specifies the JDBC connect string used to connect to the metastore

 While saving sqoop command as job we have to specify password file and hence we need to pre-create password file based on the database we want to connect to.

Adding Password file
We cannot just add password to a file and start using it as there will be end of line character.
We can have password file either in HDFS or on Local File System.
While using specifying the password file from Local File System, we need to use file protocol.
It is better to echo the password and write to a file. echo -n "password" > password_file 
command is use to remove newline from password_file
Let us run sample sqoop command using password file.
sqoop eval \
  --connect jdbc:mysql:dburl \
  --username retail_user \
  --password-file file:///home/training/password_file \    //file at local
  -e "SELECT count(1) FROM orders"

Creating Sqoop Job
As we have created password file, let us understand how to create sqoop job.
We can use sqoop job --create to create the job.
Typically we create jobs for sqoop import, sqoop export etc.
Once we identified the relevant sqoop command, we can create the job with the command and password file.
We also need to assign unique id for the sqoop job 
Here is the example to create a job for sqoop import command using sqoop job. Once the job is created we can list created jobs using sqoop job --list.
sqoop job \
  --create import_orders \
  -- import \
  --connect jdbc:mysql://dburl \
  --username retail_user \
  --password-file file:///home/training/password_file \
  --table orders \
  --warehouse-dir /user/training/sqoop_import/retail_db \
  --delete-target-dir

//show all sqoop jobs
sqoop job --list

//show all details with respect to sqoop job
sqoop job --show import_orders

//exec sqoop job
sqoop job --exec import_orders

//delete sqoop jobs
sqoop job --delete import_orders

Sqoop Incremental Imports
Incremental import arguments:
Argument
Description
--check-column (col)
Specifies the column to be examined when determining which rows to import. (the column should not be of type CHAR/NCHAR/VARCHAR/VARNCHAR/ LONGVARCHAR/LONGNVARCHAR)
--incremental (mode)
Specifies how Sqoop determines which rows are new. Legal values for mode include append(primary key) and lastmodified(date).
--last-value (value)
Specifies the maximum value of the check column from the previous import.

Check for data in case of incremental import
sqoop eval \
  --connect jdbc:mysql:dburl \
  --username retail_user \
  --password-file file:///home/training/password_file \    //file at local
  -e "SELECT Max(order_date) FROM orders"

//Output
Max(order_date)
2018-09-23 00:00:00.0
For HDFS 
Incremental import we generally use append
sqoop job\
  --connect jdbc:mysql:dburl \
  --username retail_user \
  --password-file file:///home/training/password_file \    //file at local
  --table orders \
  --warehouse-dir /user/training/sqoop_import/retail_db \
  --append \
 --where ''order_id >883"
//Check how many records are received in map-reduced function.

sqoop eval \
  --connect jdbc:mysql:dburl \
  --username retail_user \
  --password-file file:///home/training/password_file \    //file at local
  --table orders \
  --warehouse-dir /user/training/sqoop_import/retail_db \
 --check-column order_id \    \\primary key
 --incremental append \
 --last-value 883
 if any new record then code will receive > 0.

To create sqoop job @1
sqoop job \
  --create import_orders_incr \
  --connect jdbc:mysql:dburl \
  --username retail_user \
  --password-file file:///home/training/password_file \    //file at local
  --table orders \
  --warehouse-dir /user/training/sqoop_import/retail_db \
 --check-column order_id \    \\primary key
 --incremental append \
 --last-value 0

To get property related to incremental load
sqoop job --show import_orders_incr|grep incremental 

In first run of sqoop job @1 it will move all records from mysql to hdfs 
when we rerun command sqoop job --show import_orders_incr|grep incremental it gives us three values like :
incremental.last.value = 1000
incremental.col = order_id
incremental.mode = AppendRows

Now, add some records in mysql and re run sqoop import_orders_incr then we found records have been moved to hdfs. We may run sqoop job --show import_orders_incr|grep incremental command to get all informations.

incremental mode as lastmodified(where date field present)
Check for directory if exists remove directory.
To create sqoop job @2
sqoop job \
  --create import_orders_incr \
  --connect jdbc:mysql:dburl \
  --username retail_user \
  --password-file file:///home/training/password_file \    //file at local
  --table orders \
  --warehouse-dir /user/training/sqoop_import/retail_db \
 --check-column order_date \  //not null
 --incremental lastmodified\
 --last-value "2018-07-23 00:00:00.0";

No comments:

Post a Comment

Introduction to Sqoop

  Apache Sqoop is a tool designed for efficiently transferring bulk data between Apache Hadoop and external datastores such as relatio...