Use Spark History Server

In Hadoop Eco, the Spark History Server can be used to monitor Spark jobs. The Spark History Server visually provides logs of Spark jobs executed by the user through the Spark web UI, allowing you to check the execution status and results of the jobs.

caution

There may be discrepancies in data between the internal Spark History Servers of the cluster, depending on the log upload cycle.

Spark History Server features

The Spark History Server provides the following features:

Monitoring Spark jobs both inside and outside the cluster
Viewing event and container logs of running Spark applications
Viewing the history of Spark jobs from terminated clusters

Collect Spark History Server logs

To use the Spark History Server, Hadoop Eco collects and stores the following two types of user logs:

Log type	Description
Spark Apps logs	Event logs collected when Spark jobs are executed, storing job execution times, task status, etc.
YARN container logs	Logs generated by Spark executors during job execution, stored as stdout and stderr in YARN container logs.

These logs are stored in the location specified by the user when the cluster is created.
(Log storage configuration can be set during cluster creation, and cannot be configured after cluster creation.)

However, logs may not be collected properly under the following conditions:

Spark History Server execution conditions

Hadoop Eco cluster status

If the cluster is not in a Running state, the Spark History Server may not run properly.
However, it can be executed on a cluster that has been terminated but whose logs have been collected correctly.

Log collection settings

If the following settings are changed by the user, logs may not be collected properly.
- Spark's spark-defaults.conf settings

spark.eventLog.enabled                  true
spark.eventLog.dir                      hdfs:///var/log/spark/apps

Configure yarn-site.xml

커버로스 및 레인저 설치 방법
<configuration>
    <property>
        <name>yarn.log-aggregation-enable</name>
        <value>true</value>
    </property>

    <property>
        <name>yarn.nodemanager.log-dirs</name>
        <value>/hadoop/yarn/log</value>
    </property>

    <property>
        <name>yarn.nodemanager.log.retain-seconds</name>
        <value>604800</value>
    </property>
</configuration>

caution

If the log retention period is set too short, it may be difficult to verify the logs.

Run Spark history server

Go to the KakaoCloud console > Analytics > Hadoop Eco menu.
In the Cluster menu, select the [Create cluster] button located at the top right.
In Step 3: Detailed Settings > Cluster Detailed Settings, change the log storage setting to [Enabled].
Select an Object Storage bucket, specify the desired path, and select the [Create] button.
Once the created Hadoop Eco cluster status changes to Running, select the cluster and navigate to the cluster's detail page.
In the detail page, select the Run Spark History Server button located at the top right.
- After clicking the button, the server will start within 5 minutes. Once the process is complete, the button will change to Open Spark History Server, and the server will automatically run.
- The Spark History Server will remain active for a certain period if there are user requests. If there are no requests, it will shut down automatically.
- Even if the server shuts down, the user logs will not be deleted, and the server can be restarted via the Run Spark History Server button.

caution

In the case of large log files, the Spark History Server may terminate due to memory shortage while running.
If the stored logs are deleted, the Spark History Server may not function properly.
If security features are enabled, log collection errors may occur depending on the settings.

Spark History Server features​

Collect Spark History Server logs​

Spark History Server execution conditions​

Run Spark history server​

Spark History Server features

Collect Spark History Server logs

Spark History Server execution conditions

Run Spark history server