Proof of O(log*n) time complexity of union–find

In computer science, Union Find is an algorithm for doing certain operations on sets. This page is about proof of O(log^*n) amortized time ^[1] of Union Find^[2]^[3]^[4]

Statement: If m operations, either Union or Find, are applied to n elements, the total run time is O(m log^*n), where log^* is the iterated logarithm.

Proof

Lemma 1: As the find function follows the path along to the root, the rank of node it encounters is increasing.

Proof: claim that as Find and Union operations are applied to the data set, this fact remains true over time. Initially when each node is the root of its own tree, it's trivially true. The only case when the rank of a node might be changed is when the Union by Rank operation is applied. In this case, a tree with smaller rank will be attached to a tree with greater rank, rather than vice versa. And during the find operation, all nodes visited along the path will be attached to the root, which has larger rank than its children, so this operation won't change this fact either.

Lemma 2: A node u which is root of a subtree with rank r has at least 2^r nodes.

Proof: Initially when each node is the root of its own tree, it's trivially true. Assume that a node u with rank r has at least 2^r nodes. Then when two trees with rank r Union by Rank and form a tree with rank r + 1, the new node has at least 2^r + 2^r = 2^{r + 1} nodes.

Lemma 3: The maximum number of nodes of rank r is at most n/2^r.

Proof: From lemma 2, we know that a node u which is root of a subtree with rank r has at least 2^r nodes. We will get the maximum number of nodes of rank r when each node with rank r is the root of a tree that has exactly 2^r nodes. In this case, the number of nodes of rank r is n/2^r

For convenience, we define "bucket" here: a bucket is a set that contains vertices with particular ranks.

We create some buckets and put vertices into the buckets according to their ranks inductively. That is, vertices with rank 0 go into the zeroth bucket, vertices with rank 1 go into the first bucket, vertices with ranks 2 and 3 go into the second bucket. If the Bth bucket contains vertices with ranks from interval [r, 2^r − 1] = [r, R - 1] then the (B+1)st bucket will contain vertices with ranks from interval [R, 2^R − 1].

Proof of [math]\displaystyle{ O(\log^*n) }[/math] Union Find

We can make two observations about the buckets.

The total number of buckets is at most log^*n
Proof: When we go from one bucket to the next, we add one more two to the power, that is, the next bucket to [B, 2^B − 1] will be [2^B, 2^{2^B} − 1]
The maximum number of elements in bucket [B, 2^B – 1] is at most 2n/2^B
Proof: The maximum number of elements in bucket [B, 2^B – 1] is at most n/2^B + n/2^B+1 + n/2^B+2 + … + n/2^{2^B – 1} ≤ 2n/2^B

Let F represent the list of "find" operations performed, and let

[math]\displaystyle{ T_1 = \sum_F\text{(link to the root)} }[/math]

[math]\displaystyle{ T_2 = \sum_F\text{(number of links traversed where the buckets are different)} }[/math]

[math]\displaystyle{ T_3 = \sum_F\text{(number of links traversed where the buckets are the same).} }[/math]

Then the total cost of m finds is T = T₁ + T₂ + T₃

Since each find operation makes exactly one traversal that leads to a root, we have T₁ = O(m).

Also, from the bound above on the number of buckets, we have T₂ = O(mlog^*n).

For T₃, suppose we are traversing an edge from u to v, where u and v have rank in the bucket [B, 2^B − 1] and v is not the root (at the time of this traversing, otherwise the traversal would be accounted for in T₁). Fix u and consider the sequence v₁,v₂,...,v_k that take the role of v in different find operations. Because of path compression and not accounting for the edge to a root, this sequence contains only different nodes and because of Lemma 1 we know that the ranks of the nodes in this sequence are strictly increasing. By both of the nodes being in the bucket we can conclude that the length k of the sequence (the number of times node u is attached to a different root in the same bucket) is at most the number of ranks in the buckets B, i.e. at most 2^B − 1 − B < 2^B.

Therefore, [math]\displaystyle{ T_3 \le \sum_{[B, 2^B - 1]} \sum_{u} 2^B. }[/math]

From Observations 1 and 2, we can conclude that [math]\displaystyle{ T_3\le\sum_{B} 2^B\frac{2n}{2^B}\le 2n\log^*n. }[/math]

Therefore, T = T₁ + T₂ + T₃ = O(m log^*n).

References

↑ Raimund Seidel, Micha Sharir. "Top-down analysis of path compression", SIAM J. Comput. 34(3):515–525, 2005
↑ Tarjan, Robert Endre (1975). "Efficiency of a Good But Not Linear Set Union Algorithm". Journal of the ACM 22 (2): 215–225. doi:10.1145/321879.321884. http://portal.acm.org/citation.cfm?id=321884.
↑ Hopcroft, J. E.; Ullman, J. D. (1973). "Set Merging Algorithms". SIAM Journal on Computing 2 (4): 294–303. doi:10.1137/0202024.
↑ Robert E. Tarjan and Jan van Leeuwen. Worst-case analysis of set union algorithms. Journal of the ACM, 31(2):245–281, 1984.

0.00

(0 votes)

[name_for_reference-1] Raimund Seidel, Micha Sharir. "Top-down analysis of path compression", SIAM J. Comput. 34(3):515–525, 2005

[2] Tarjan, Robert Endre (1975). "Efficiency of a Good But Not Linear Set Union Algorithm". Journal of the ACM 22 (2): 215–225. doi:10.1145/321879.321884. http://portal.acm.org/citation.cfm?id=321884.

[3] Hopcroft, J. E.; Ullman, J. D. (1973). "Set Merging Algorithms". SIAM Journal on Computing 2 (4): 294–303. doi:10.1137/0202024.

[4] Robert E. Tarjan and Jan van Leeuwen. Worst-case analysis of set union algorithms. Journal of the ACM, 31(2):245–281, 1984.

[1]

[2]

[3]

[4]

Anonymous

Search

Proof of O(log*n) time complexity of union–find

Namespaces

More

Page actions

Proof

References

Navigation

Navigation

Help

Translate

Wiki tools

Wiki tools

Anonymous

Search

Proof of O(log*n) time complexity of union–find

Proof

References

Navigation

Wiki tools

Page tools

Other projects

Categories