Use Spark History Server
In Hadoop Eco, the Spark History Server can be used to monitor Spark jobs. The Spark History Server visually provides logs of Spark jobs executed by the user through the Spark web UI, allowing you to check the execution status and results of the jobs.
There may be discrepancies in data between the internal Spark History Servers of the cluster, depending on the log upload cycle.
Spark History Server features
The Spark History Server provides the following features:
- Monitoring Spark jobs both inside and outside the cluster
- Viewing event and container logs of running Spark applications
- Viewing the history of Spark jobs from terminated clusters
Collect Spark History Server logs
To use the Spark History Server, Hadoop Eco collects and stores the following two types of user logs:
Log type | Description |
---|---|
Spark Apps logs | Event logs collected when Spark jobs are executed, storing job execution times, task status, etc. |
YARN container logs | Logs generated by Spark executors during job execution, stored as stdout and stderr in YARN container logs. |
These logs are stored in the location specified by the user when the cluster is created.
(Log storage configuration can be set during cluster creation, and cannot be configured after cluster creation.)
However, logs may not be collected properly under the following conditions:
Spark History Server execution conditions
- Hadoop Eco cluster status
- If the cluster is not in a Running state, the Spark History Server may not run properly.
- However, it can be executed on a cluster that has been terminated but whose logs have been collected correctly.
- Log collection settings
- If the following settings are changed by the user, logs may not be collected properly.
- Spark's
spark-defaults.conf
settings
- Spark's
spark.eventLog.enabled true
spark.eventLog.dir hdfs:///var/log/spark/apps
- Configure yarn-site.xml
커버로스 및 레인저 설치 방법
<configuration>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<name>yarn.nodemanager.log-dirs</name>
<value>/hadoop/yarn/log</value>
</property>
<property>
<name>yarn.nodemanager.log.retain-seconds</name>
<value>604800</value>
</property>
</configuration>
If the log retention period is set too short, it may be difficult to verify the logs.
Run Spark history server
-
Go to the KakaoCloud Console > Analytics > Hadoop Eco menu.
-
In the Cluster menu, click the [Create Cluster] button located at the top right.
-
In Step 3: Detailed Settings > Cluster Detailed Settings, change the log storage setting to [Enabled].
-
Select an Object Storage bucket, specify the desired path, and click the [Create] button.
-
Once the created Hadoop Eco cluster status changes to Running, select the cluster and navigate to the cluster's detail page.
-
In the detail page, click the Run Spark History Server button located at the top right.
- After clicking the button, the server will start within 5 minutes. Once the process is complete, the button will change to Open Spark History Server, and the server will automatically run.
- The Spark History Server will remain active for a certain period if there are user requests. If there are no requests, it will shut down automatically.
- Even if the server shuts down, the user logs will not be deleted, and the server can be restarted via the Run Spark History Server button.
- In the case of large log files, the Spark History Server may terminate due to memory shortage while running.
- If the stored logs are deleted, the Spark History Server may not function properly.
- If security features are enabled, log collection errors may occur depending on the settings.