For help on sqoop job type : sqoop help job
While saving sqoop command as job we have to specify password file and hence we need to pre-create password file based on the database we want to connect to.
Check for data in case of incremental import
|
Argument
|
Description
|
|
--create <job-id>
|
Define
a new saved job with the specified job-id (name). A second Sqoop
command-line, separated by a -- should
be specified; this defines the saved job.
|
|
--delete <job-id>
|
Delete
a saved job.
|
|
--exec <job-id>
|
Given
a job defined with --create, run the saved
job.
|
|
--show <job-id>
|
Show
the parameters for a saved job.
|
|
--list
|
List
all saved jobs
|
|
--meta-connect <jdbc-uri>
|
Specifies
the JDBC connect string used to connect to the metastore
|
Adding Password file
We cannot just add password to a file and start using it as there will be end of line character.
We can have password file either in HDFS or on Local File System.
While using specifying the password file from Local File System, we need to use file protocol.
It is better to echo the password and write to a file. echo -n "password" > password_file
command is use to remove newline from password_file
Let us run sample sqoop command using password file.
sqoop eval \
--connect jdbc:mysql:dburl \
--username retail_user \
--password-file file:///home/training/password_file \ //file at local
-e "SELECT count(1) FROM orders"
Creating Sqoop Job
As we have created password file, let us understand how to create sqoop job.
We can use sqoop job --create to create the job.
Typically we create jobs for sqoop import, sqoop export etc.
Once we identified the relevant sqoop command, we can create the job with the command and password file.
We also need to assign unique id for the sqoop job
Here is the example to create a job for sqoop import command using sqoop job. Once the job is created we can list created jobs using sqoop job --list.
sqoop job \
--create import_orders \
-- import \
--connect jdbc:mysql://dburl \
--username retail_user \
--password-file file:///home/training/password_file \
--table orders \
--warehouse-dir /user/training/sqoop_import/retail_db \
--delete-target-dir
//show all sqoop jobs
sqoop job --list
//show all details with respect to sqoop job
sqoop job --show import_orders
//exec sqoop job
sqoop job --exec import_orders
//delete sqoop jobs
sqoop job --delete import_orders
Sqoop Incremental Imports
Incremental
import arguments:
|
Argument
|
Description
|
|
--check-column (col)
|
Specifies the column to be
examined when determining which rows to import. (the column should not be of
type CHAR/NCHAR/VARCHAR/VARNCHAR/ LONGVARCHAR/LONGNVARCHAR)
|
|
--incremental (mode)
|
Specifies how Sqoop determines
which rows are new. Legal values for mode include append(primary key) and lastmodified(date).
|
|
--last-value (value)
|
Specifies the maximum value of the
check column from the previous import.
|
Check for data in case of incremental import
sqoop eval \
--connect jdbc:mysql:dburl \
--username retail_user \
--password-file file:///home/training/password_file \ //file at local
-e "SELECT Max(order_date) FROM orders"
//Output
Max(order_date)
2018-09-23 00:00:00.0
For HDFS
Incremental import we generally use append
sqoop job\
--connect jdbc:mysql:dburl \
--username retail_user \
--password-file file:///home/training/password_file \ //file at local
--table orders \
--warehouse-dir /user/training/sqoop_import/retail_db \
--append \
--where ''order_id >883"
//Check how many records are received in map-reduced function.
sqoop eval \
--connect jdbc:mysql:dburl \
--username retail_user \
--password-file file:///home/training/password_file \ //file at local
--table orders \
--warehouse-dir /user/training/sqoop_import/retail_db \
--check-column order_id \ \\primary key
--incremental append \
--last-value 883
if any new record then code will receive > 0.
To create sqoop job @1
sqoop job \
--create import_orders_incr \
--connect jdbc:mysql:dburl \
--username retail_user \
--password-file file:///home/training/password_file \ //file at local
--table orders \
--warehouse-dir /user/training/sqoop_import/retail_db \
--check-column order_id \ \\primary key
--incremental append \
--last-value 0
To get property related to incremental load
sqoop job --show import_orders_incr|grep incremental
In first run of sqoop job @1 it will move all records from mysql to hdfs
when we rerun command sqoop job --show import_orders_incr|grep incremental it gives us three values like :
incremental.last.value = 1000
incremental.col = order_id
incremental.mode = AppendRows
Now, add some records in mysql and re run sqoop import_orders_incr then we found records have been moved to hdfs. We may run sqoop job --show import_orders_incr|grep incremental command to get all informations.
incremental mode as lastmodified(where date field present)
Check for directory if exists remove directory.
To create sqoop job @2
sqoop job \
--create import_orders_incr \
--connect jdbc:mysql:dburl \
--username retail_user \
--password-file file:///home/training/password_file \ //file at local
--table orders \
--warehouse-dir /user/training/sqoop_import/retail_db \
--check-column order_date \ //not null
--incremental lastmodified\
--last-value "2018-07-23 00:00:00.0";
No comments:
Post a Comment