Ozzie 스케줄링 설정
Oozie 스케줄링 설정
Oozie는 Hadoop Eco 클러스터의 유형이 Core Hadoop일 때 제공되는 워크플로 작업 도구입니다.
Oozie를 통해 번들, 워크플로, 코디네이터 목록과 상세 정보, 로그를 확인할 수 있습니다. Oozie는 클러스터 상세 페이지의 퀵 링크를 클릭하여 접속할 수 있습니다. Oozie 스케줄링을 설정하는 방법은 다음과 같습니다.
클러스터 유형 | 접속 포트 |
---|---|
표준(Single) | 마스터 1번 노드의 11000 포트 |
HA | 마스터 3번 노드의 11000 포트 |
Oozie 워크플로 목록
Oozie 워크플로 작업 정보
Oozie 워크플로 작업 내용
사전 준비
Oozie의 워크플로 작업을 실행하기 위해서는 workflow.xml 파일과 부수적인 실행 파일이 필요합니다. 준비한 파일을 HDFS에 업로드하고 wf.properties 파일에 경로를 지정하여 실행하면 됩니다. Hive 작업의 형태는 다음과 같습니다.
$ hadoop fs -ls hdfs:///wf_hive/
Found 3 items
-rw-r--r-- 2 ubuntu hadoop 22762 2022-03-30 05:11 hdfs:///wf_hive/hive-site.xml
-rw-r--r-- 2 ubuntu hadoop 168 2022-03-30 05:11 hdfs:///wf_hive/sample.hql
-rw-r--r-- 2 ubuntu hadoop 978 2022-03-30 05:11 hdfs:///wf_hive/workflow.xml
$ oozie job -run -config wf.properties
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/oozie-5.2.1/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/oozie-5.2.1/lib/slf4j-log4j12-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
ERROR StatusLogger Log4j2 could not find a logging implementation. Please add log4j-core to the classpath. Using SimpleLogger to log to the console...
job: 0000000-220330040805876-oozie-ubun-W
$ oozie job -info 0000000-220330040805876-oozie-ubun-W
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/oozie-5.2.1/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/oozie-5.2.1/lib/slf4j-log4j12-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
ERROR StatusLogger Log4j2 could not find a logging implementation. Please add log4j-core to the classpath. Using SimpleLogger to log to the console...
Job ID : 0000000-220330040805876-oozie-ubun-W
------------------------------------------------------------------------------------------------------------------------------------
Workflow Name : workflow_sample_job
App Path : hdfs:///wf_hive
Status : SUCCEEDED
Run : 0
User : ubuntu
Group : -
Created : 2022-03-30 05:12 GMT
Started : 2022-03-30 05:12 GMT
Last Modified : 2022-03-30 05:13 GMT
Ended : 2022-03-30 05:13 GMT
CoordAction ID: -
Actions
------------------------------------------------------------------------------------------------------------------------------------
ID Status Ext ID Ext Status Err Code
------------------------------------------------------------------------------------------------------------------------------------
0000000-220330040805876-oozie-ubun-W@:start: OK - OK -
------------------------------------------------------------------------------------------------------------------------------------
0000000-220330040805876-oozie-ubun-W@hive_action OK application_1648613240828_0002SUCCEEDED -
------------------------------------------------------------------------------------------------------------------------------------
0000000-220330040805876-oozie-ubun-W@end OK - OK -
------------------------------------------------------------------------------------------------------------------------------------
Oozie 워크플로 실행
Oozie 워크플로를 실행하여 Oozie 스케줄링을 설정할 수 있습니다.
-
workflw.xml과 실행에 관련된 파일을 준비합니다.
-
HDFS에 폴더를 만들고 관련된 파일을 업로드합니다.
HDFS에 파일 업로드hadoop fs -put
-
wf.properties 파일의 실행 경로를 업로드한 경로로 설정합니다.
- oozie.wf.application.path 설정
-
Oozie 작업을 실행합니다.
Oozie 작업- oozie job -run -config wf.properties
-
결과를 확인합니다.
Oozie 결과 확인- oozie job -info [workflow id]
Hive 작업 예제
표준(Single), 고가용성(HA) 유형의 경우 워크플로의 리소스 매니저와 네임 노드 값만 다르고 나머지는 일치합니다.
- 표준(Single)
- 고가용성(HA)
<workflow-app xmlns="uri:oozie:workflow:1.0" name="workflow_sample_job">
<start to="hive_action" />
<action name="hive_action">
<hive xmlns="uri:oozie:hive-action:1.0">
<resource-manager>hadoopmst-hadoop-single-1:8050</resource-manager>
<name-node>hdfs://hadoopmst-hadoop-single-1</name-node>
<job-xml>hive-site.xml</job-xml>
<configuration>
<property>
<name>hive.tez.container.size</name>
<value>2048</value>
</property>
<property>
<name>hive.tez.java.opts</name>
<value>-Xmx1600m</value>
</property>
</configuration>
<script>sample.hql</script>
</hive>
<ok to="end" />
<error to="kill" />
</action>
<kill name="kill">
<message>Error!!</message>
</kill>
<end name="end" />
</workflow-app>
$ cat sample.hql
create table if not exists t1 (col1 string);
insert into table t1 values ('a'), ('b'), ('c');
select col1, count(*) from t1 group by col1;
show tables;
show databases;
</workflow-app>
oozie.use.system.libpath=true
oozie.wf.application.path=hdfs:///wf_hive
user.name=ubuntu
<workflow-app xmlns="uri:oozie:workflow:1.0" name="workflow_sample_job">
<start to="hive_action" />
<action name="hive_action">
<hive xmlns="uri:oozie:hive-action:1.0">
<resource-manager>yarn-cluster:8050</resource-manager>
<name-node>hdfs://hadoop-ha</name-node>
<job-xml>hive-site.xml</job-xml>
<configuration>
<property>
<name>hive.tez.container.size</name>
<value>2048</value>
</property>
<property>
<name>hive.tez.java.opts</name>
<value>-Xmx1600m</value>
</property>
</configuration>
<script>sample.hql</script>
</hive>
<ok to="end" />
<error to="kill" />
</action>
<kill name="kill">
<message>Error!!</message>
</kill>
<end name="end" />
</workflow-app>
$ cat sample.hql
create table if not exists t1 (col1 string);
insert into table t1 values ('a'), ('b'), ('c');
select col1, count(*) from t1 group by col1;
show tables;
show databases;
</workflow-app>
oozie.use.system.libpath=true
oozie.wf.application.path=hdfs:///wf_hive
user.name=ubuntu
Spark 작업 예제
표준(Single), 고가용성(HA) 유형의 경우 워크플로의 리소스 매니저와 네임 노드 값이 다르고, spark-opts으로 전달하는 값에 차이가 있습니다. 해당 부분을 주의하여 실행하시기 바랍니다.
- 표준(Single)
- 고가용성(HA)
<workflow-app xmlns="uri:oozie:workflow:1.0" name="workflow_sample_job">
<start to="spark_action" />
<action name="spark_action">
<spark xmlns="uri:oozie:spark-action:1.0">
<resource-manager>hadoopmst-hadoop-single-1:8050</resource-manager>
<name-node>hdfs://hadoopmst-hadoop-single-1</name-node>
<master>yarn-client</master>
<name>Spark Example</name>
<class>org.apache.spark.examples.SparkPi</class>
<jar>/opt/spark/examples/jars/spark-examples_2.11-2.4.6.jar</jar>
<spark-opts>--executor-memory 2G --conf spark.hadoop.yarn.resourcemanager.address=hadoopmst-hadoop-single-1:8050 --conf spark.yarn.stagingDir=hdfs://hadoopmst-hadoop-single-1/user/ubuntu --conf spark.yarn.appMasterEnv.HADOOP_CONF_DIR=/etc/hadoop/conf --conf spark.io.compression.codec=snappy</spark-opts>
<arg>100</arg>
</spark>
<ok to="end" />
<error to="kill" />
</action>
<kill name="kill">
<message>Error!!</message>
</kill>
<end name="end" />
</workflow-app>
oozie.use.system.libpath=true
oozie.wf.application.path=hdfs:///wf_spark
user.name=ubuntu
<workflow-app xmlns="uri:oozie:workflow:1.0" name="workflow_sample_job">
<start to="spark_action" />
<action name="spark_action">
<spark xmlns="uri:oozie:spark-action:1.0">
<resource-manager>yarn-cluster:8050</resource-manager>
<name-node>hdfs://hadoop-ha</name-node>
<master>yarn-client</master>
<name>Spark Example</name>
<class>org.apache.spark.examples.SparkPi</class>
<jar>/opt/spark/examples/jars/spark-examples_2.11-2.4.6.jar</jar>
<spark-opts>--executor-memory 2G --conf spark.hadoop.yarn.resourcemanager.address=yarn-cluster:8050 --conf spark.yarn.stagingDir=hdfs://hadoop-ha/user/ubuntu --conf spark.yarn.appMasterEnv.HADOOP_CONF_DIR=/etc/hadoop/conf --conf spark.io.compression.codec=snappy</spark-opts>
<arg>100</arg>
</spark>
<ok to="end" />
<error to="kill" />
</action>
<kill name="kill">
<message>Error!!</message>
</kill>
<end name="end" />
</workflow-app>
oozie.use.system.libpath=true
oozie.wf.application.path=hdfs:///wf_spark
user.name=ubuntu
Shell 작업 예제
표준(Single), 고가용성(HA) 유형의 경우 워크플로의 리소스 매니저와 네임 노드 값만 다르고 나머지는 일치합니다.
- 표준(Single)
- 고가용성(HA)
<workflow-app xmlns='uri:oozie:workflow:1.0' name='shell-wf'>
<start to='shell1' />
<action name='shell1'>
<shell xmlns="uri:oozie:shell-action:1.0">
<resource-manager>hadoopmst-hadoop-single-1:8050</resource-manager>
<name-node>hdfs://hadoopmst-hadoop-single-1</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>default</value>
</property>
</configuration>
<exec>echo.sh</exec>
<argument>A</argument>
<argument>B</argument>
<file>echo.sh#echo.sh</file>
</shell>
<ok to="end" />
<error to="fail" />
</action>
<kill name="fail">
<message>Script failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name='end' />
</workflow-app>
oozie.use.system.libpath=true
oozie.wf.application.path=hdfs:///wf_shell
user.name=ubuntu
<workflow-app xmlns='uri:oozie:workflow:1.0' name='shell-wf'>
<start to='shell1' />
<action name='shell1'>
<shell xmlns="uri:oozie:shell-action:1.0">
<resource-manager>yarn-cluster:8050</resource-manager>
<name-node>hdfs://hadoop-ha</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>default</value>
</property>
</configuration>
<exec>echo.sh</exec>
<argument>A</argument>
<argument>B</argument>
<file>echo.sh#echo.sh</file>
</shell>
<ok to="end" />
<error to="fail" />
</action>
<kill name="fail">
<message>Script failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name='end' />
</workflow-app>
oozie.use.system.libpath=true
oozie.wf.application.path=hdfs:///wf_shell
user.name=ubuntu