Concept class

From HandWiki
Short description: Computational learning theory

}}In computational learning theory in mathematics, a concept over a domain X is a total Boolean function over X. A concept class is a class of concepts. Concept classes are a subject of computational learning theory.

Concept class terminology frequently appears in model theory associated with probably approximately correct (PAC) learning.[1] In this setting, if one takes a set Y as a set of (classifier output) labels, and X is a set of examples, the map [math]\displaystyle{ c: X\to Y }[/math], i.e. from examples to classifier labels (where [math]\displaystyle{ Y = \{0, 1\} }[/math] and where c is a subset of X), c is then said to be a concept. A concept class [math]\displaystyle{ C }[/math] is then a collection of such concepts.

Given a class of concepts C, a subclass D is reachable if there exists a sample s such that D contains exactly those concepts in C that are extensions to s.[2] Not every subclass is reachable.[2][why?]

Background

A sample [math]\displaystyle{ s }[/math] is a partial function from [math]\displaystyle{ X }[/math][clarification needed] to [math]\displaystyle{ \{0, 1\} }[/math].[2] Identifying a concept with its characteristic function mapping [math]\displaystyle{ X }[/math] to [math]\displaystyle{ \{0, 1\} }[/math], it is a special case of a sample.[2]

Two samples are consistent if they agree on the intersection of their domains.[2] A sample [math]\displaystyle{ s' }[/math] extends another sample [math]\displaystyle{ s }[/math] if the two are consistent and the domain of [math]\displaystyle{ s }[/math] is contained in the domain of [math]\displaystyle{ s' }[/math].[2]

Examples

Suppose that [math]\displaystyle{ C = S^+(X) }[/math]. Then:

  • the subclass [math]\displaystyle{ \{\{x\}\} }[/math] is reachable with the sample [math]\displaystyle{ s = \{(x, 1)\} }[/math];[2][why?]
  • the subclass [math]\displaystyle{ S^+(Y) }[/math] for [math]\displaystyle{ Y\subseteq X }[/math] are reachable with a sample that maps the elements of [math]\displaystyle{ X - Y }[/math] to zero;[2][why?]
  • the subclass [math]\displaystyle{ S(X) }[/math], which consists of the singleton sets, is not reachable.[2][why?]

Applications

Let [math]\displaystyle{ C }[/math] be some concept class. For any concept [math]\displaystyle{ c\in C }[/math], we call this concept [math]\displaystyle{ 1/d }[/math]-good for a positive integer [math]\displaystyle{ d }[/math] if, for all [math]\displaystyle{ x\in X }[/math], at least [math]\displaystyle{ 1/d }[/math] of the concepts in [math]\displaystyle{ C }[/math] agree with [math]\displaystyle{ c }[/math] on the classification of [math]\displaystyle{ x }[/math].[2] The fingerprint dimension [math]\displaystyle{ FD(C) }[/math] of the entire concept class [math]\displaystyle{ C }[/math] is the least positive integer [math]\displaystyle{ d }[/math] such that every reachable subclass [math]\displaystyle{ C'\subseteq C }[/math] contains a concept that is [math]\displaystyle{ 1/d }[/math]-good for it.[2] This quantity can be used to bound the minimum number of equivalence queries[clarification needed] needed to learn a class of concepts according to the following inequality:[math]\displaystyle{ FD(C) - 1\leq \#EQ(C)\leq \lceil FD(C)\ln(|C|)\rceil }[/math].[2]

References