Skip to main content

Set up Oozie scheduling

Set up Oozie scheduling

Oozie is a workflow management tool available when the Hadoop Eco cluster type is set to Core Hadoop.
With Oozie, you can view bundles, workflows, coordinator lists, details, and logs. You can access Oozie from the quick link on the cluster detail page. Below is how to configure Oozie scheduling.

Cluster typeAccess port
Standard (Single)Port 11000 on master node 1
HAPort 11000 on master node 3

Image
Oozie workflow list

Image
Oozie workflow job details

Image
Oozie workflow job content

Prepare

To run an Oozie workflow, you need a workflow.xml file and supplementary execution files.
Upload these files to HDFS and specify the path in the wf.properties file to execute. A Hive job structure is shown below.

Hive job cluster_meta_configuration.json example
$ hadoop fs -ls hdfs:///wf_hive/
Found 3 items
-rw-r--r-- 2 ubuntu hadoop 22762 2022-03-30 05:11 hdfs:///wf_hive/hive-site.xml
-rw-r--r-- 2 ubuntu hadoop 168 2022-03-30 05:11 hdfs:///wf_hive/sample.hql
-rw-r--r-- 2 ubuntu hadoop 978 2022-03-30 05:11 hdfs:///wf_hive/workflow.xml

$ oozie job -run -config wf.properties
... [SLF4J log output] ...
job: 0000000-220330040805876-oozie-ubun-W

$ oozie job -info 0000000-220330040805876-oozie-ubun-W
... [SLF4J log output] ...
Job ID : 0000000-220330040805876-oozie-ubun-W
------------------------------------------------------------------------------------------------------------------------------------
Workflow Name : workflow_sample_job
App Path : hdfs:///wf_hive
Status : SUCCEEDED
Run : 0
User : ubuntu
Group : -
Created : 2022-03-30 05:12 GMT
Started : 2022-03-30 05:12 GMT
Last Modified : 2022-03-30 05:13 GMT
Ended : 2022-03-30 05:13 GMT
CoordAction ID: -

Actions
------------------------------------------------------------------------------------------------------------------------------------
ID Status Ext ID Ext Status Err Code
------------------------------------------------------------------------------------------------------------------------------------
0000000-220330040805876-oozie-ubun-W@:start: OK - OK -
0000000-220330040805876-oozie-ubun-W@hive_action OK application_1648613240828_0002SUCCEEDED -
0000000-220330040805876-oozie-ubun-W@end OK - OK -

Run Oozie workflow

You can run an Oozie workflow to configure scheduling.

  1. Prepare workflow.xml and related execution files.

  2. Create a folder in HDFS and upload the related files.

    Upload files to HDFS
    hadoop fs -put
  3. Set the execution path in the wf.properties file to the uploaded HDFS path.

    • Configure oozie.wf.application.path
  4. Run the Oozie job.

    Run Oozie job
    oozie job -run -config wf.properties
  5. Check the result.

    Check Oozie result
    oozie job -info [workflow id]

Hive job example

In Standard (Single) and High Availability (HA) configurations, only the resource manager and name node values differ in the workflow.

workflow.xml
<workflow-app xmlns="uri:oozie:workflow:1.0" name="workflow_sample_job">
<start to="hive_action" />
<action name="hive_action">
<hive xmlns="uri:oozie:hive-action:1.0">
<resource-manager>hadoopmst-hadoop-single-1:8050</resource-manager>
<name-node>hdfs://hadoopmst-hadoop-single-1</name-node>
<job-xml>hive-site.xml</job-xml>
<configuration>
<property>
<name>hive.tez.container.size</name>
<value>2048</value>
</property>
<property>
<name>hive.tez.java.opts</name>
<value>-Xmx1600m</value>
</property>
</configuration>
<script>sample.hql</script>
</hive>
<ok to="end" />
<error to="kill" />
</action>
<kill name="kill">
<message>Error!!</message>
</kill>
<end name="end" />
</workflow-app>
sample.hql
$ cat sample.hql
create table if not exists t1 (col1 string);
insert into table t1 values ('a'), ('b'), ('c');
select col1, count(*) from t1 group by col1;
show tables;
show databases;
wf.properties
oozie.use.system.libpath=true
oozie.wf.application.path=hdfs:///wf_hive
user.name=ubuntu

Spark job example

In Standard (Single) and High Availability (HA) configurations, the resource manager and name node values differ, along with the values passed via spark-opts.
Please be careful when configuring these parts.

workflow.xml
<workflow-app xmlns="uri:oozie:workflow:1.0" name="workflow_sample_job">
<start to="spark_action" />

<action name="spark_action">
<spark xmlns="uri:oozie:spark-action:1.0">
<resource-manager>hadoopmst-hadoop-single-1:8050</resource-manager>
<name-node>hdfs://hadoopmst-hadoop-single-1</name-node>
<master>yarn-client</master>
<name>Spark Example</name>
<class>org.apache.spark.examples.SparkPi</class>
<jar>/opt/spark/examples/jars/spark-examples_2.11-2.4.6.jar</jar>
<spark-opts>--executor-memory 2G --conf spark.hadoop.yarn.resourcemanager.address=hadoopmst-hadoop-single-1:8050 --conf spark.yarn.stagingDir=hdfs://hadoopmst-hadoop-single-1/user/ubuntu --conf spark.yarn.appMasterEnv.HADOOP_CONF_DIR=/etc/hadoop/conf --conf spark.io.compression.codec=snappy</spark-opts>
<arg>100</arg>
</spark>
<ok to="end" />
<error to="kill" />
</action>

<kill name="kill">
<message>Error!!</message>
</kill>

<end name="end" />
</workflow-app>
wf.properties
oozie.use.system.libpath=true
oozie.wf.application.path=hdfs:///wf_spark
user.name=ubuntu

Shell job example

In Standard (Single) and High Availability (HA) configurations, only the resource manager and name node values differ. All other settings are the same.

workflow.xml
<workflow-app xmlns='uri:oozie:workflow:1.0' name='shell-wf'>
<start to='shell1' />
<action name='shell1'>
<shell xmlns="uri:oozie:shell-action:1.0">
<resource-manager>hadoopmst-hadoop-single-1:8050</resource-manager>
<name-node>hdfs://hadoopmst-hadoop-single-1</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>default</value>
</property>
</configuration>
<exec>echo.sh</exec>
<argument>A</argument>
<argument>B</argument>
<file>echo.sh#echo.sh</file>
</shell>
<ok to="end" />
<error to="fail" />
</action>
<kill name="fail">
<message>Script failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name='end' />
</workflow-app>
wf.properties
oozie.use.system.libpath=true
oozie.wf.application.path=hdfs:///wf_shell
user.name=ubuntu