Configure scheduling
Configure YARN scheduler
The following explains how to configure YARN's default schedulers: Capacity Scheduler and Fair Scheduler.
Scheduler types
Capacity Scheduler
Capacity Scheduler is YARN's default scheduler, managing YARN resources by declaring tree-structured queues and allocating capacity to each queue.
Configuration keys for Capacity Scheduler
Configuration key | Description |
---|---|
yarn.scheduler.capacity.maximum-applications | Maximum number of applications that can be set to PRE or RUNNING status. |
yarn.scheduler.capacity.maximum-am-resource-percent | Maximum percentage of resources that can be allocated to the Application Master (AM). |
yarn.scheduler.capacity.root.queues | Register the names of child queues under the root queue. |
yarn.scheduler.capacity.root.[queue_name].maximum-am-resource-percent | Percentage of resources the AM can use in the queue. |
yarn.scheduler.capacity.root.[queue_name].capacity | Capacity percentage allocated to the queue. |
yarn.scheduler.capacity.root.[queue_name].user-limit-factor | The queue can use resources up to the limit-factor of the assigned capacity, but cannot exceed maximum-capacity. |
yarn.scheduler.capacity.root.[queue_name].maximum-capacity | Maximum capacity the queue can use. |
Configure capacity scheduler
<configuration>
<property>
<name>yarn.scheduler.capacity.maximum-applications</name>
<value>10000</value>
</property>
<property>
<name>yarn.scheduler.capacity.maximum-am-resource-percent</name>
<value>0.1</value>
</property>
<property>
<name>yarn.scheduler.capacity.resource-calculator</name>
<value>org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.queues</name>
<value>prd,stg</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.prd.capacity</name>
<value>80</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.stg.capacity</name>
<value>20</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.prd.user-limit-factor</name>
<value>1</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.stg.user-limit-factor</user-limit-factor</name>
<value>2</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.prd.maximum-capacity</name>
<value>100</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.stg.maximum-capacity</name>
<value>30</value>
</property>
</configuration>
Fair Scheduler
Fair Scheduler ensures that submitted jobs share resources equally. When jobs are submitted to a queue, the cluster adjusts resources to allocate them evenly across all jobs.
Configuration keys for Fair Scheduler | Description |
---|---|
yarn.scheduler.fair.allocation.file | Name of the Fair Scheduler configuration file. |
yarn.scheduler.fair.user-as-default-queue | Whether to use the default queue when a queue name is not specified. |
yarn.scheduler.fair.preemption | Whether to enable priority preemption. |
Fair Scheduler configuration example
<?xml version="1.0"?>
<allocations>
<queue name="dev">
<minResources>10000 mb,10vcores</minResources>
<maxResources>60000 mb,30vcores</maxResources>
<maxRunningApps>50</maxRunningApps>
<maxAMShare>1.0</maxAMShare>
<weight>2.0</weight>
<schedulingPolicy>fair</schedulingPolicy>
</queue>
<queue name="prd">
<minResources>10000 mb,10vcores</minResources>
<maxResources>60000 mb,30vcores</maxResources>
<maxRunningApps>100</maxRunningApps>
<maxAMShare>0.1</maxAMShare>
<weight>2.0</weight>
<schedulingPolicy>fair</schedulingPolicy>
<queue name="sub_prd">
<aclSubmitApps>charlie</aclSubmitApps>
<minResources>5000 mb,0vcores</minResources>
</queue>
</queue>
<user name="sample_user">
<maxRunningApps>30</maxRunningApps>
</user>
<userMaxAppsDefault>5</userMaxAppsDefault>
<queueMaxAMShareDefault>0.2</queueMaxAMShareDefault>
<queuePlacementPolicy>
<rule name="specified"/>
<rule name="primaryGroup" create="false"/>
<rule name="default" queue="dev"/>
</queuePlacementPolicy>
</allocations>
Change scheduler
The default scheduler in Hadoop Eco is Capacity Scheduler. To switch to Fair Scheduler, modify the yarn-site.xml configuration and restart the service.
Scheduler change example
<!--- Capacity Scheduler --->
<property>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
</property>
<!--- Fair Scheduler --->
<property>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
</property>
Update scheduler configuration
You can update the scheduler configuration while the ResourceManager is running. After modifying the XML configuration file, run the following commands:
Update scheduler configuration example
yarn rmadmin -refreshQueues