Configure scheduling
Configure YARN scheduler
Below are the methods for configuring the two main YARN schedulers: the Capacity Scheduler and the Fair Scheduler.
Define scheduler types
Capacity Scheduler
The Capacity Scheduler is the default scheduler in YARN. It manages YARN resources by declaring a tree structure of queues and allocating available resources to each queue.
Capacity Scheduler configuration keys
Configuration key | Value |
---|---|
yarn.scheduler.capacity.maximum-applications | The maximum number of applications that can be in PRE or RUNNING state. |
yarn.scheduler.capacity.maximum-am-resource-percent | The maximum percentage of resources that can be allocated to the Application Master (AM). |
yarn.scheduler.capacity.root.queues | Names of the child queues registered under the root queue. |
yarn.scheduler.capacity.root.[queue_name].maximum-am-resource-percent | The percentage of resources the AM can use in the queue. |
yarn.scheduler.capacity.root.[queue_name].capacity | The resource capacity percentage of the queue. |
yarn.scheduler.capacity.root.[queue_name].user-limit-factor | The limit factor for other queues' capacity usage, but cannot exceed maximum-capacity. |
yarn.scheduler.capacity.root.[queue_name].maximum-capacity | The maximum resource capacity the queue can use. |
<configuration>
<property>
<name>yarn.scheduler.capacity.maximum-applications</name>
<value>10000</value>
</property>
<property>
<name>yarn.scheduler.capacity.maximum-am-resource-percent</name>
<value>0.1</value>
</property>
<property>
<name>yarn.scheduler.capacity.resource-calculator</name>
<value>org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.queues</name>
<value>prd,stg</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.prd.capacity</name>
<value>80</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.stg.capacity</name>
<value>20</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.prd.user-limit-factor</name>
<value>1</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.stg.user-limit-factor</user-limit-factor</name>
<value>2</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.prd.maximum-capacity</name>
<value>100</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.stg.maximum-capacity</name>
<value>30</value>
</property>
</configuration>
Fair Scheduler
The Fair Scheduler ensures that submitted jobs equally share resources. When jobs are submitted to a queue, the cluster adjusts resources to allocate them evenly across all jobs.
Fair Scheduler configuration key | Value |
---|---|
yarn.scheduler.fair.allocation.file | The name of the Fair Scheduler configuration file. |
yarn.scheduler.fair.user-as-default-queue | Whether to use the default queue when no queue name is specified. |
yarn.scheduler.fair.preemption | Whether to allow preemption for priority scheduling. |
<?xml version="1.0"?>
<allocations>
<queue name="dev">
<minResources>10000 mb,10vcores</minResources>
<maxResources>60000 mb,30vcores</maxResources>
<maxRunningApps>50</maxRunningApps>
<maxAMShare>1.0</maxAMShare>
<weight>2.0</weight>
<schedulingPolicy>fair</schedulingPolicy>
</queue>
<queue name="prd">
<minResources>10000 mb,10vcores</minResources>
<maxResources>60000 mb,30vcores</maxResources>
<maxRunningApps>100</maxRunningApps>
<maxAMShare>0.1</maxAMShare>
<weight>2.0</weight>
<schedulingPolicy>fair</schedulingPolicy>
<queue name="sub_prd">
<aclSubmitApps>charlie</aclSubmitApps>
<minResources>5000 mb,0vcores</minResources>
</queue>
</queue>
<user name="sample_user">
<maxRunningApps>30</maxRunningApps>
</user>
<userMaxAppsDefault>5</userMaxAppsDefault>
<queueMaxAMShareDefault>0.2</queueMaxAMShareDefault>
<queuePlacementPolicy>
<rule name="specified"/>
<rule name="primaryGroup" create="false"/>
<rule name="default" queue="dev"/>
</queuePlacementPolicy>
</allocations>
Change scheduler
The default scheduler for Hadoop Eco is the Capacity Scheduler. To switch to the Fair Scheduler, modify the yarn-site.xml configuration and restart the service.
<!--- Capacity Scheduler --->
<property>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
</property>
<!--- Fair Scheduler --->
<property>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
</property>
Update scheduler configuration
When modifying the configuration for individual queues, the Resource Manager can apply changes while the service is running. After changing the settings in the XML file, execute the following command to apply the changes.
yarn rmadmin -refreshQueues