Dominant resource fairness

From HandWiki

Dominant resource fairness (DRF) is a rule for fair division. It is particularly useful for dividing computing resources in among users in cloud computing environments, where each user may require a different combination of resources. DRF was presented by Ali Ghodsi, Matei Zaharia, Benjamin Hindman, Andy Konwinski, Scott Shenker and Ion Stoica in 2011.[1]

Motivation

In an environment with a single resource, a widely used criterion is max-min fairness, which aims to maximize the minimum amount of resource given to a user. But in cloud computing, it is required to share different types of resource, such as: memory, CPU, bandwidth and disk-space. Previous fair schedulers, such as in Apache Hadoop, reduced the multi-resource setting to a single-resource setting by defining nodes with a fixed amount of each resource (e.g. 4 CPU, 32 MB memory, etc.), and dividing slots which are fractions of nodes. But this method is inefficient, since not all users need the same ratio of resources. For example, some users need more CPU whereas other users need more memory. As a result, most tasks either under-utilize or over-utilize their resources.

DRF solves the problem by maximizing the minimum amount of the dominant resource given to a user (then the second-minimum etc., in a leximin order). The dominant resource may be different for different users. For example, if user A runs CPU-heavy tasks and user B runs memory-heavy tasks, DRF will try to equalize the CPU share given to user A and the memory share given to user B.

Definition

There are m resources. The total capacities of the resources are r1,...,rm.

There are n users. Each users runs individual tasks. Each task has a demand-vector (d1,..,dm), representing the amount it needs of each resource. It is implicitly assumed that the utility of a user equals the number of tasks he can perform. For example, if user A runs tasks with demand-vector [1 CPU, 4 GB RAM], and receives 3 CPU and 8 GB RAM, then his utility is 2, since he can perform only 2 tasks. More generally, the utility of a user receiving x1,...,xm resources is minj(xj/dj), that is, the users have Leontief utilities.

The demand-vectors are normalized to fractions of the capacities. For example, if the system has 9 CPUs and 18 GB RAM, then the above demand-vector is normalized to [1/9 CPU, 2/9 GB]. For each user, the resource with the highest demand-fraction is called the dominant resource. In the above example, the dominant resource is memory, as 2/9 is the largest fraction. If user B runs a task with demand-vector [3 CPU, 1 GB], which is normalized to [1/3 CPU, 1/18 GB], then his dominant resource is CPU.

DRF aims to find the maximum x such that all agents can receive at least x of their dominant resource. In the above example, this maximum x is 2/3:

  • User A gets 3 tasks, which require 3/9 CPU and 2/3 GB.
  • User B gets 2 tasks, which require 2/3 CPU and 1/9 GB.

The maximum x can be found by solving a linear program; see Lexicographic max-min optimization. Alternatively, the DRF can be computed sequentially.[1]:{{{1}}} The algorithm tracks the amount of dominant resource used by each user. At each round, it finds a user with the smallest allocated dominant resource so far, and allocates the next task of this user. Note that this procedure allows the same user to run tasks with different demand vectors.

Properties

DRF has several advantages over other policies for resource allocation.

  1. Proportionality: each user receives at least as much resources as he could get in a system in which all resources are partitioned equally among users (the authors call this condition "sharing incentive").
  2. Strategyproofness: a user cannot get a larger allocation by lying about his needs. Strategyproofness is important, as evidence from cloud operators show that users try to manipulate the servers in order to get better allocations.
  3. Envy-freeness: no user would prefer the allocation of another user.
  4. Pareto efficiency: no other allocation is better for some users and not worse for anyone.
  5. Population monotonicity: when a user leaves the system, the allocations of remaining users do not decrease.

When there is a single resource that is a bottleneck resource (highly demanded by all users), DRF reduces to max-min fairness.

However, DRF violates resource monotonicity: when resources are added to the system, some allocations might decrease.

Extensions

Weighted DRF is an extension of DRF to settings in which different users have different weights (representing their different entitlements).[1]:{{{1}}}

Parkes, Procaccia and Shah[2] formally extend weighted DRF to a setting in which some users do not need all resources (that is, they may have demand 0 to some resource). They prove that the extended version still satisfies proportionality, Pareto-efficiency, envy-freeness, strategyproofness, and even Group strategyproofness. On the other hand, they show that DRF may yield poor utilitarian social welfare, that is, the sum of utilities may be only 1/m of the optimum. However, they prove that any mechanism satisfying one of proportionality, envy-freeness or strategyproofness may suffers from the same low utilitarian welfare. They also extend DRF to the setting in which the users' demands are indivisible (as in fair item allocation). For the indivisible setting, they relax envy-freeness to EF1. They show that strategyproofness is incompatible with PO+EF1 or with PO+proportionality. However, a mechanism called SequentialMinMax satisfies efficiency, proportionality and EF1.

Wang, Li and Liang[3] present DRFH - an extension of DRF to a system with several heterogeneous servers.

Implementation

DRF was first implemented in Apache Mesos - a cluster resource manager, and it led to better throughput and fairness than previously used fair-sharing schemes.

See also

References