Axiom of limitation of size

John von Neumann

In set theory, the axiom of limitation of size was proposed by John von Neumann in his 1925 axiom system for sets and classes.[1] It formalizes the limitation of size principle, which avoids the paradoxes encountered in earlier formulations of set theory by recognizing that some classes are too big to be sets. Von Neumann realized that the paradoxes are caused by permitting these big classes to be members of a class.[2] A class that is a member of a class is a set; a class that is not a set is a proper class. Every class is a subclass of V, the class of all sets.[lower-alpha 1] The axiom of limitation of size says that a class is a set if and only if it is smaller than V—that is, there is no function mapping it onto V. Usually, this axiom is stated in the equivalent form: A class is a proper class if and only if there is a function that maps it onto V.

Von Neumann's axiom implies the axioms of replacement, separation, union, and global choice. It is equivalent to the combination of replacement, union, and global choice in Von Neumann–Bernays–Gödel set theory (NBG) and Morse–Kelley set theory. Later expositions of class theories—such as those of Paul Bernays, Kurt Gödel, and John L. Kelley—use replacement, union, and a choice axiom equivalent to global choice rather than von Neumann's axiom.[3] In 1930, Ernst Zermelo defined models of set theory satisfying the axiom of limitation of size.[4]

Abraham Fraenkel and Azriel Lévy have stated that the axiom of limitation of size does not capture all of the "limitation of size doctrine" because it does not imply the power set axiom.[5] Michael Hallett has argued that the limitation of size doctrine does not justify the power set axiom and that "von Neumann's explicit assumption [of the smallness of power-sets] seems preferable to Zermelo's, Fraenkel's, and Lévy's obscurely hidden implicit assumption of the smallness of power-sets."[6]

Formal statement

The usual version of the axiom of limitation of size—a class is a proper class if and only if there is a function that maps it onto V—is expressed in the formal language of set theory as:

\displaystyle{ \begin{align} \forall C \Bigl[ \lnot \exist D \left(C \in D\right) \iff \exist F \bigl[&\,\forall y \bigl(\exist D(y \in D) \implies \exist x [\,x \in C \land (x, y) \in F\,]\bigr) \\ &\, \land \, \forall x \forall y \forall z \bigl(\,[\,(x, y) \in F \land (x, z) \in F\,] \implies y = z\bigr)\,\bigr]\,\Bigr] \end{align} }

Gödel introduced the convention that uppercase variables range over all the classes, while lowercase variables range over all the sets.[7] This convention allows us to write:

• $\displaystyle{ \exist y\, \varphi(y) }$ instead of $\displaystyle{ \exist y \bigl(\exist D (y \in D) \land \varphi(y)\bigr) }$
• $\displaystyle{ \forall y\, \varphi(y) }$ instead of $\displaystyle{ \forall y \bigl(\exist D (y \in D) \implies \varphi(y)\bigr) }$

With Gödel's convention, the axiom of limitation of size can be written:

\displaystyle{ \begin{align} \forall C \Bigl[ \lnot \exist D \left( C \in D\right) \iff \exist F \bigl[&\,\forall y \exist x \bigl( x \in C \land (x, y) \in F \bigr) \\ &\, \land \, \forall x \forall y \forall z \bigl(\,[\,(x, y) \in F \land (x, z) \in F\,] \implies y = z\bigr)\,\bigr]\,\Bigr] \end{align} }

Implications of the axiom

Von Neumann proved that the axiom of limitation of size implies the axiom of replacement, which can be expressed as: If F is a function and A is a set, then F(A) is a set. This is proved by contradiction. Let F be a function and A be a set. Assume that F(A) is a proper class. Then there is a function G that maps F(A) onto V. Since the composite function G ∘ F maps A onto V, the axiom of limitation of size implies that A is a proper class, which contradicts A being a set. Therefore, F(A) is a set. Since the axiom of replacement implies the axiom of separation, the axiom of limitation of size implies the axiom of separation.[lower-alpha 2]

Von Neumann also proved that his axiom implies that V can be well-ordered. The proof starts by proving by contradiction that Ord, the class of all ordinals, is a proper class. Assume that Ord is a set. Since it is a transitive set that is strictly well-ordered by ∈, it is an ordinal. So Ord ∈ Ord, which contradicts Ord being strictly well-ordered by ∈. Therefore, Ord is a proper class. So von Neumann's axiom implies that there is a function F that maps Ord onto V. To define a well-ordering of V, let G be the subclass of F consisting of the ordered pairs (α, x) where α is the least β such that (β, x) ∈ F; that is, G = {(α, x) ∈ F: ∀β((β, x) ∈ F ⇒ α ≤ β)}. The function G is a one-to-one correspondence between a subset of Ord and V. Therefore, x < y if G−1(x) < G−1(y) defines a well-ordering of V. This well-ordering defines a global choice function: Let Inf(x) be the least element of a non-empty set x. Since Inf(x) ∈ x, this function chooses an element of x for every non-empty set x. Therefore, Inf(x) is a global choice function, so Von Neumann's axiom implies the axiom of global choice.

In 1968, Azriel Lévy proved that von Neumann's axiom implies the axiom of union. First, he proved without using the axiom of union that every set of ordinals has an upper bound. Then he used a function that maps Ord onto V to prove that if A is a set, then ∪A is a set.[8]

The axioms of replacement, global choice, and union (with the other axioms of NBG) imply the axiom of limitation of size.[lower-alpha 3] Therefore, this axiom is equivalent to the combination of replacement, global choice, and union in NBG or Morse–Kelley set theory. These set theories only substituted the axiom of replacement and a form of the axiom of choice for the axiom of limitation of size because von Neumann's axiom system contains the axiom of union. Lévy's proof that this axiom is redundant came many years later.[9]

The axioms of NBG with the axiom of global choice replaced by the usual axiom of choice do not imply the axiom of limitation of size. In 1964, William B. Easton used forcing to build a model of NBG with global choice replaced by the axiom of choice.[10] In Easton's model, V cannot be linearly ordered, so it cannot be well-ordered. Therefore, the axiom of limitation of size fails in this model. Ord is an example of a proper class that cannot be mapped onto V because (as proved above) if there is a function mapping Ord onto V, then V can be well-ordered.

The axioms of NBG with the axiom of replacement replaced by the weaker axiom of separation do not imply the axiom of limitation of size. Define $\displaystyle{ \omega_\alpha }$ as the $\displaystyle{ \alpha }$-th infinite initial ordinal, which is also the cardinal $\displaystyle{ \aleph_\alpha }$; numbering starts at $\displaystyle{ 0 }$, so $\displaystyle{ \omega_0 = \omega. }$ In 1939, Gödel pointed out that Lωω, a subset of the constructible universe, is a model of ZFC with replacement replaced by separation.[11] To expand it into a model of NBG with replacement replaced by separation, let its classes be the sets of Lωω+1, which are the constructible subsets of Lωω. This model satisfies NBG's class existence axioms because restricting the set variables of these axioms to Lωω produces instances of the axiom of separation, which holds in L.[lower-alpha 4] It satisfies the axiom of global choice because there is a function belonging to Lωω+1 that maps ωω onto Lωω, which implies that Lωω is well-ordered.[lower-alpha 5] The axiom of limitation of size fails because the proper class {ωn : n ∈ ω} has cardinality $\displaystyle{ \aleph 0 }$, so it cannot be mapped onto Lωω, which has cardinality $\displaystyle{ \aleph_\omega }$.[lower-alpha 6]

In a 1923 letter to Zermelo, von Neumann stated the first version of his axiom: A class is a proper class if and only if there is a one-to-one correspondence between it and V.[2] The axiom of limitation of size implies von Neumann's 1923 axiom. Therefore, it also implies that all proper classes are equinumerous with V.

Zermelo's models and the axiom of limitation of size

Ernst Zermelo in the 1900s

In 1930, Zermelo published an article on models of set theory, in which he proved that some of his models satisfy the axiom of limitation of size.[4] These models are built in ZFC by using the cumulative hierarchy Vα, which is defined by transfinite recursion:

1. V0 = .[lower-alpha 8]
2. Vα+1 = Vα ∪ P(Vα). That is, the union of Vα and its power set.[lower-alpha 9]
3. For limit β: Vβ = ∪α < β Vα. That is, Vβ is the union of the preceding Vα.

Zermelo worked with models of the form Vκ where κ is a cardinal. The classes of the model are the subsets of Vκ, and the model's ∈-relation is the standard ∈-relation. The sets of the model are the classes X such that XVκ.[lower-alpha 10] Zermelo identified cardinals κ such that Vκ satisfies:[12]

Theorem 1. A class X is a set if and only if |X| < κ.
Theorem 2. |Vκ| = κ.

Since every class is a subset of Vκ, Theorem 2 implies that every class X has cardinality ≤ κ. Combining this with Theorem 1 proves: every proper class has cardinality κ. Hence, every proper class can be put into one-to-one correspondence with Vκ. This correspondence is a subset of Vκ, so it is a class of the model. Therefore, the axiom of limitation of size holds for the model Vκ.

The theorem stating that Vκ has a well-ordering can be proved directly. Since κ is an ordinal of cardinality κ and |Vκ| = κ, there is a one-to-one correspondence between κ and Vκ. This correspondence produces a well-ordering of Vκ. Von Neumann's proof is indirect. It uses the Burali-Forti paradox to prove by contradiction that the class of all ordinals is a proper class. Hence, the axiom of limitation of size implies that there is a function that maps the class of all ordinals onto the class of all sets. This function produces a well-ordering of Vκ.[13]

The model Vω

To demonstrate that Theorems 1 and 2 hold for some Vκ, we first prove that if a set belongs to Vα then it belongs to all subsequent Vβ, or equivalently: Vα ⊆ Vβ for α ≤ β. This is proved by transfinite induction on β:

1. β = 0: V0 ⊆ V0.
2. For β+1: By inductive hypothesis, Vα ⊆ Vβ. Hence, Vα ⊆ Vβ ⊆ Vβ ∪ P(Vβ) = Vβ+1.
3. For limit β: If α < β, then Vα ⊆ ∪ξ < β Vξ = Vβ. If α = β, then Vα ⊆ Vβ.

Sets enter the cumulative hierarchy through the power set P(Vβ) at step β+1. The following definitions will be needed:

If x is a set, rank(x) is the least ordinal β such that x ∈ Vβ+1.[14]
The supremum of a set of ordinals A, denoted by sup A, is the least ordinal β such that α ≤ β for all α ∈ A.

Zermelo's smallest model is Vω. Mathematical induction proves that Vn is finite for all n < ω:

1. |V0| = 0.
2. |Vn+1| = |Vn ∪ P(Vn)| ≤ |Vn| + 2 |Vn|, which is finite since Vn is finite by inductive hypothesis.

Proof of Theorem 1: A set X enters Vω through P(Vn) for some n < ω, so X ⊆ Vn. Since Vn is finite, X is finite. Conversely: If a class X is finite, let N = sup {rank(x): x ∈ X}. Since rank(x) ≤ N for all x ∈ X, we have X ⊆ VN+1, so X ∈ VN+2 ⊆ Vω. Therefore, X ∈ Vω.

Proof of Theorem 2: Vω is the union of countably infinitely many finite sets of increasing size. Hence, it has cardinality $\displaystyle{ \aleph_0 }$, which equals ω by von Neumann cardinal assignment.

The sets and classes of Vω satisfy all the axioms of NBG except the axiom of infinity.[lower-alpha 11]

The models Vκ where κ is a strongly inaccessible cardinal

Two properties of finiteness were used to prove Theorems 1 and 2 for Vω:

1. If λ is a finite cardinal, then 2λ is finite.
2. If A is a set of ordinals such that |A| is finite, and α is finite for all α ∈ A, then sup A is finite.

To find models satisfying the axiom of infinity, replace "finite" by "< κ" to produce the properties that define strongly inaccessible cardinals. A cardinal κ is strongly inaccessible if κ > ω and:

1. If λ is a cardinal such that λ < κ, then 2λ < κ.
2. If A is a set of ordinals such that |A| < κ, and α < κ for all α ∈ A, then sup A < κ.

These properties assert that κ cannot be reached from below. The first property says κ cannot be reached by power sets; the second says κ cannot be reached by the axiom of replacement.[lower-alpha 12] Just as the axiom of infinity is required to obtain ω, an axiom is needed to obtain strongly inaccessible cardinals. Zermelo postulated the existence of an unbounded sequence of strongly inaccessible cardinals.[lower-alpha 13]

If κ is a strongly inaccessible cardinal, then transfinite induction proves |Vα| < κ for all α < κ:

1. α = 0: |V0| = 0.
2. For α+1: |Vα+1| = |Vα ∪ P(Vα)| ≤ |Vα| + 2 |Vα| = 2 |Vα| < κ. Last inequality uses inductive hypothesis and κ being strongly inaccessible.
3. For limit α: |Vα| = |∪ξ < α Vξ| ≤ sup {|Vξ| : ξ < α} < κ. Last inequality uses inductive hypothesis and κ being strongly inaccessible.

Proof of Theorem 1: A set X enters Vκ through P(Vα) for some α < κ, so X ⊆ Vα. Since |Vα| < κ, we obtain |X| < κ. Conversely: If a class X has |X| < κ, let β = sup {rank(x): x ∈ X}. Because κ is strongly inaccessible, |X| < κ and rank(x) < κ for all x ∈ X imply β = sup {rank(x): x ∈ X} < κ. Since rank(x) ≤ β for all x ∈ X, we have X ⊆ Vβ+1, so X ∈ Vβ+2 ⊆ Vκ. Therefore, X ∈ Vκ.

Proof of Theorem 2: |Vκ| = |∪α < κ Vα| ≤ sup {|Vα| : α < κ}. Let β be this supremum. Since each ordinal in the supremum is less than κ, we have β ≤ κ. Assume β < κ. Then there is a cardinal λ such that β < λ < κ; for example, let λ = 2|β|. Since λ ⊆ Vλ and |Vλ| is in the supremum, we have λ ≤ |Vλ| ≤ β. This contradicts β < λ. Therefore, |Vκ| = β = κ.

The sets and classes of Vκ satisfy all the axioms of NBG.[lower-alpha 14]

Limitation of size doctrine

The limitation of size doctrine is a heuristic principle that is used to justify axioms of set theory. It avoids the set theoretical paradoxes by restricting the full (contradictory) comprehension axiom schema:

$\displaystyle{ \forall w_1,\ldots,w_n \, \exists x \, \forall u \, ( u \in x \iff \varphi(u, w_1, \ldots, w_n) ) }$

to instances "that do not give sets 'too much bigger' than the ones they use."[15]

If "bigger" means "bigger in cardinal size," then most of the axioms can be justified: The axiom of separation produces a subset of x that is not bigger than x. The axiom of replacement produces an image set f(x) that is not bigger than x. The axiom of union produces a union whose size is not bigger than the size of the biggest set in the union times the number of sets in the union.[16] The axiom of choice produces a choice set whose size is not bigger than the size of the given set of nonempty sets.

The limitation of size doctrine does not justify the axiom of infinity:

$\displaystyle{ \exists y \, [\empty \in y \, \land \, \forall x (x \in y \implies x \cup \{x\} \in y)], }$

which uses the empty set and sets obtained from the empty set by iterating the ordinal successor operation. Since these sets are finite, any set satisfying this axiom, such as ω, is much bigger than these sets. Fraenkel and Lévy regard the empty set and the infinite set of natural numbers, whose existence is implied by the axioms of infinity and separation, as the starting point for generating sets.[17]

Von Neumann's approach to limitation of size uses the axiom of limitation of size. As mentioned in § Implications of the axiom, von Neumann's axiom implies the axioms of separation, replacement, union, and choice. Like Fraenkel and Lévy, von Neumann had to add the axiom of infinity to his system since it cannot be proved from his other axioms.[lower-alpha 15] The differences between von Neumann's approach to limitation of size and Fraenkel and Lévy's approach are:

• Von Neumann's axiom puts limitation of size into an axiom system, making it possible to prove most set existence axioms. The limitation of size doctrine justifies axioms using informal arguments that are more open to disagreement than a proof.
• Von Neumann assumed the power set axiom since it cannot be proved from his other axioms.[lower-alpha 16] Fraenkel and Lévy state that the limitation of size doctrine justifies the power set axiom.[18]

There is disagreement on whether the limitation of size doctrine justifies the power set axiom. Michael Hallett has analyzed the arguments given by Fraenkel and Lévy. Some of their arguments measure size by criteria other than cardinal size—for example, Fraenkel introduces "comprehensiveness" and "extendability." Hallett points out what he considers to be flaws in their arguments.[19]

Hallett then argues that results in set theory seem to imply that there is no link between the size of an infinite set and the size of its power set. This would imply that the limitation of size doctrine is incapable of justifying the power set axiom because it requires that the power set of x is not "too much bigger" than x. For the case where size is measured by cardinal size, Hallett mentions Paul Cohen's work.[20] Starting with a model of ZFC and $\displaystyle{ \aleph_\alpha }$, Cohen built a model in which the cardinality of the power set of ω is $\displaystyle{ \aleph_\alpha }$ if the cofinality of $\displaystyle{ \aleph_\alpha }$ is not ω; otherwise, its cardinality is $\displaystyle{ \aleph_{\alpha+1} }$.[21] Since the cardinality of the power set of ω has no bound, there is no link between the cardinal size of ω and the cardinal size of P(ω).[22]

Hallett also discusses the case where size is measured by "comprehensiveness," which considers a collection "too big" if it is of "unbounded comprehension" or "unlimited extent."[23] He points out that for an infinite set, we cannot be sure that we have all its subsets without going through the unlimited extent of the universe. He also quotes John L. Bell and Moshé Machover: "... the power set P(u) of a given [infinite] set u is proportional not only to the size of u but also to the 'richness' of the entire universe ..."[24] After making these observations, Hallett states: "One is led to suspect that there is simply no link between the size (comprehensiveness) of an infinite a and the size of P(a)."[20]

Hallett considers the limitation of size doctrine valuable for justifying most of the axioms of set theory. His arguments only indicate that it cannot justify the axioms of infinity and power set.[25] He concludes that "von Neumann's explicit assumption [of the smallness of power-sets] seems preferable to Zermelo's, Fraenkel's, and Lévy's obscurely hidden implicit assumption of the smallness of power-sets."[6]

History

Von Neumann developed the axiom of limitation of size as a new method of identifying sets. ZFC identifies sets via its set building axioms. However, as Abraham Fraenkel pointed out: "The rather arbitrary character of the processes which are chosen in the axioms of Z [ZFC] as the basis of the theory, is justified by the historical development of set-theory rather than by logical arguments."[26]

The historical development of the ZFC axioms began in 1908 when Zermelo chose axioms to eliminate the paradoxes and to support his proof of the well-ordering theorem.[lower-alpha 17] In 1922, Abraham Fraenkel and Thoralf Skolem pointed out that Zermelo's axioms cannot prove the existence of the set {Z0Z1Z2, ...} where Z0 is the set of natural numbers, and Zn+1 is the power set of Zn.[27] They also introduced the axiom of replacement, which guarantees the existence of this set.[28] However, adding axioms as they are needed neither guarantees the existence of all reasonable sets nor clarifies the difference between sets that are safe to use and collections that lead to contradictions.

In a 1923 letter to Zermelo, von Neumann outlined an approach to set theory that identifies sets that are "too big" and might lead to contradictions.[lower-alpha 18] Von Neumann identified these sets using the criterion: "A set is 'too big' if and only if it is equivalent with the set of all things." He then restricted how these sets may be used: "... in order to avoid the paradoxes those [sets] which are 'too big' are declared to be impermissible as elements."[29] By combining this restriction with his criterion, von Neumann obtained his first version of the axiom of limitation of size, which in the language of classes states: A class is a proper class if and only if it is equinumerous with V.[2] By 1925, Von Neumann modified his axiom by changing "it is equinumerous with V" to "it can be mapped onto V", which produces the axiom of limitation of size. This modification allowed von Neumann to give a simple proof of the axiom of replacement.[1] Von Neumann's axiom identifies sets as classes that cannot be mapped onto V. Von Neumann realized that, even with this axiom, his set theory does not fully characterize sets.[lower-alpha 19]

Gödel found von Neumann's axiom to be "of great interest":

"In particular I believe that his [von Neumann's] necessary and sufficient condition which a property must satisfy, in order to define a set, is of great interest, because it clarifies the relationship of axiomatic set theory to the paradoxes. That this condition really gets at the essence of things is seen from the fact that it implies the axiom of choice, which formerly stood quite apart from other existential principles. The inferences, bordering on the paradoxes, which are made possible by this way of looking at things, seem to me, not only very elegant, but also very interesting from the logical point of view.[lower-alpha 20] Moreover I believe that only by going farther in this direction, i.e., in the direction opposite to constructivism, will the basic problems of abstract set theory be solved."[30]

Notes

1. Proof: Let A be a class and X ∈ A. Then X is a set, so X ∈ V. Therefore, A ⊆ V.
2. Proof that uses von Neumann's axiom: Let A be a set and B be the subclass produced by the axiom of separation. Using proof by contradiction, assume B is a proper class. Then there is a function F mapping B onto V. Define the function G mapping A to V: if x ∈ B then G(x) = F(x); if x ∈ A \ B then G(x) = . Since F maps A onto V, G maps A onto V. So the axiom of limitation of size implies that A is a proper class, which contradicts A being a set. Therefore, B is a set.
3. This can be rephrased as: NBG implies the axiom of limitation of size. In 1929, von Neumann proved that the axiom system that later evolved into NBG implies the axiom of limitation of size. (Ferreirós 2007, p. 380.)
4. An axiom's set variable is restricted on the right side of the "if and only if." Also, an axiom's class variables are converted to set variables. For example, the class existence axiom $\displaystyle{ \forall A \, \exists B \, \forall u \, [u \in B \Leftrightarrow u \notin A)] }$ becomes $\displaystyle{ \forall a \, \exists b \, \forall u \, [u \in b \Leftrightarrow (u \in L_{\omega_\omega} \land u \notin a)]. }$ The class existence axioms are in Gödel 1940, p. 5.
5. Gödel defined a function $\displaystyle{ F }$ that maps the class of ordinals onto $\displaystyle{ L }$. The function $\displaystyle{ {F|}_{\omega_\omega} }$ (which is the restriction of $\displaystyle{ F }$ to $\displaystyle{ \omega_\omega }$) maps $\displaystyle{ \omega_\omega }$ onto $\displaystyle{ L_{\omega_\omega} }$, and it belongs to $\displaystyle{ L_{\omega_{\omega+1}} }$ because it is a constructible subset of $\displaystyle{ L_{\omega_\omega} }$. Gödel uses the notation $\displaystyle{ F''\omega_\alpha }$ for $\displaystyle{ L_{\omega_\alpha} }$. (Gödel 1940, pp. 37–38, 54.)
6. Proof by contradiction that $\displaystyle{ \{\omega_n: n \in \omega\} }$ is a proper class: Assume that it is a set. By the axiom of union, $\displaystyle{ \cup\,\{\omega_n: n \in \omega\} }$ is a set. This union equals $\displaystyle{ \omega_\omega }$, the model's proper class of all ordinals, which contradicts the union being a set. Therefore, $\displaystyle{ \{\omega_n: n \in \omega\} }$ is a proper class.
Proof that $\displaystyle{ |L_{\omega_\omega}| = \aleph_\omega\!: }$ The function $\displaystyle{ {F|}_{\omega_\omega} }$ maps $\displaystyle{ \omega_\omega }$ onto $\displaystyle{ L_{\omega_\omega} }$, so $\displaystyle{ |L_{\omega_\omega}| \le |\omega_\omega|. }$ Also, $\displaystyle{ \omega_\omega \subseteq L_{\omega_\omega} }$ implies $\displaystyle{ |\omega_\omega| \le |L_{\omega_\omega}|. }$ Therefore, $\displaystyle{ |L_{\omega_\omega}| = |\omega_\omega| = \aleph_\omega. }$
7. This is the first half of theorem 7.7 in Gödel 1940, p. 27. Gödel defines the order isomorphism $\displaystyle{ F: (Ord, \lt ) \rightarrow (A, \lt ) }$ by transfinite recursion: $\displaystyle{ F(\alpha) = Inf(A \setminus \{F(\beta): \beta \in \alpha\}). }$
8. This is the standard definition of V0. Zermelo let V0 be a set of urelements and proved that if this set contains a single element, the resulting model satisfies the axiom of limitation of size (his proof also works for V0 = ∅). Zermelo stated that the axiom is not true for all models built from a set of urelements. (Zermelo 1930, p. 38; English translation: Ewald 1996, p. 1227.)
9. This is Zermelo's definition (Zermelo 1930, p. 36; English translation: Ewald 1996, p. 1225.). If V0 = ∅, this definition is equivalent to the standard definition Vα+1 = P(Vα) since Vα ⊆ P(Vα) (Kunen 1980, p. 95; Kunen uses the notation R(α) instead of Vα). If V0 is a set of urelements, the standard definition eliminates the urelements at V1.
10. If X is a set, then there is a class Y such that X ∈ Y. Since Y ⊆ Vκ, we have X ∈ Vκ. Conversely: if X ∈ Vκ, then X belongs to a class, so X is a set.
11. Zermelo proved that Vω satisfies ZFC without the axiom of infinity. The class existence axioms of NBG (Gödel 1940, p. 5) are true because Vω is a set when viewed from the set theory that constructs it (namely, ZFC). Therefore, the axiom of separation produces subsets of Vω that satisfy the class existence axioms.
12. Zermelo introduced strongly inaccessible cardinals κ so that Vκ would satisfy ZFC. The axioms of power set and replacement led him to the properties of strongly inaccessible cardinals. (Zermelo 1930, pp. 31–35; English translation: Ewald 1996, pp. 1221–1224.) Independently, Wacław Sierpiński and Alfred Tarski introduced these cardinals in 1930. (Sierpiński & Tarski 1930.)
13. Zermelo used this sequence of cardinals to obtain a sequence of models that explains the paradoxes of set theory — such as, the Burali-Forti paradox and Russell's paradox. He stated that the paradoxes "depend solely on confusing set theory itself ... with individual models representing it. What appears as an 'ultrafinite non- or super-set' in one model is, in the succeeding model, a perfectly good, valid set with both a cardinal number and an ordinal type, and is itself a foundation stone for the construction of a new domain [model]." (Zermelo 1930, pp. 46–47; English translation: Ewald 1996, p. 1223.)
14. Zermelo proved that Vκ satisfies ZFC if κ is a strongly inaccessible cardinal. The class existence axioms of NBG (Gödel 1940, p. 5) are true because Vκ is a set when viewed from the set theory that constructs it (namely, ZFC + there exist infinitely many strongly inaccessible cardinals). Therefore, the axiom of separation produces subsets of Vκ that satisfy the class existence axioms.
15. The model whose sets are the elements of $\displaystyle{ V_\omega }$ and whose classes are the subsets of $\displaystyle{ V_\omega }$ satisfies all of his axioms except for the axiom of infinity, which fails because all sets are finite.
16. The model whose sets are the elements of $\displaystyle{ L_{\omega_1} }$ and whose classes are the elements of $\displaystyle{ L_{\omega_2} }$ satisfies all of his axioms except for the power set axiom. This axiom fails because all sets are countable.
17. "... we must, on the one hand, restrict these principles [axioms] sufficiently to exclude all contradictions and, on the other hand, take them sufficiently wide to retain all that is valuable in this theory." (Zermelo 1908, p. 261; English translation: van Heijenoort 1967a, p. 200). Gregory Moore argues that Zermelo's "axiomatization was primarily motivated by a desire to secure his demonstration of the Well-Ordering Theorem ..." (Moore 1982, pp. 158–160).
18. Von Neumann published an introductory article on his axiom system in 1925 (von Neumann 1925; English translation: van Heijenoort 1967c). In 1928, he provided a detailed treatment of his system (von Neumann 1928).
19. Von Neumann investigated whether his set theory is categorical; that is, whether it uniquely determines sets in the sense that any two of its models are isomorphic. He showed that it is not categorical because of a weakness in the axiom of regularity: this axiom only excludes descending ∈-sequences from existing in the model; descending sequences may still exist outside the model. A model having "external" descending sequences is not isomorphic to a model having no such sequences since this latter model lacks isomorphic images for the sets belonging to external descending sequences. This led von Neumann to conclude "that no categorical axiomatization of set theory seems to exist at all" (von Neumann 1925, p. 239; English translation: van Heijenoort 1967c, p. 412).
20. For example, von Neumann's proof that his axiom implies the well-ordering theorem uses the Burali-Forte paradox (von Neumann 1925, p. 223; English translation: van Heijenoort 1967c, p. 398).

References

1. von Neumann 1925, p. 223; English translation: van Heijenoort 1967c, pp. 397–398.
2. Hallett 1984, p. 290.
3. Bernays 1937, pp. 66–70; Bernays 1941, pp. 1–6. Gödel 1940, pp. 3–7. Kelley 1955, pp. 251–273.
4. Zermelo 1930; English translation: Ewald 1996.
5. Fraenkel, Bar-Hillel & Levy 1973, p. 137.
6. Hallett 1984, p. 295.
7. Gödel 1940, p. 3.
8. It came 43 years later: von Neumann stated his axioms in 1925 and Lévy's proof appeared in 1968. (von Neumann 1925, Levy 1968.)
9. Easton 1964, pp. 56a–64.
10. Gödel 1939, p. 223.
11. These theorems are part of Zermelo's Second Development Theorem. (Zermelo 1930, p. 37; English translation: Ewald 1996, p. 1226.)
12. von Neumann 1925, p. 223; English translation: van Heijenoort 1967c, p. 398. Von Neumann's proof, which only uses axioms, has the advantage of applying to all models rather than just to Vκ.
13. Kunen 1980, p. 95.
14. Fraenkel, Bar-Hillel & Levy 1973, pp. 32,137.
15. Hallett 1984, p. 205.
16. Fraenkel, Bar-Hillel & Levy 1973, p. 95.
17. Hallett 1984, pp. 200, 202.
18. Hallett 1984, pp. 200–207.
19. Hallett 1984, pp. 206–207.
20. Cohen 1966, p. 134.
21. Hallett 1984, p. 207.
22. Hallett 1984, p. 200.
23. Bell & Machover 2007, p. 509.
24. Hallett 1984, pp. 209–210.
25. Historical Introduction in Bernays 1991, p. 31.
26. Fraenkel 1922, pp. 230–231. Skolem 1922; English translation: van Heijenoort 1967b, pp. 296–297).
27. Ferreirós 2007, p. 369. In 1917, Dmitry Mirimanoff published a form of replacement based on cardinal equivalence (Mirimanoff 1917, p. 49).
28. Hallett 1984, pp. 288, 290.
29. From a Nov. 8, 1957 letter Gödel wrote to Stanislaw Ulam (Kanamori 2003, p. 295).