# Semiparametric model

Short description: Type of statistical model

In statistics, a semiparametric model is a statistical model that has parametric and nonparametric components. A statistical model is a parameterized family of distributions: $\displaystyle{ \{P_\theta: \theta \in \Theta\} }$ indexed by a parameter $\displaystyle{ \theta }$.

• A parametric model is a model in which the indexing parameter $\displaystyle{ \theta }$ is a vector in $\displaystyle{ k }$-dimensional Euclidean space, for some nonnegative integer $\displaystyle{ k }$. Thus, $\displaystyle{ \theta }$ is finite-dimensional, and $\displaystyle{ \Theta \subseteq \mathbb{R}^k }$.
• With a nonparametric model, the set of possible values of the parameter $\displaystyle{ \theta }$ is a subset of some space $\displaystyle{ V }$, which is not necessarily finite-dimensional. For example, we might consider the set of all distributions with mean 0. Such spaces are vector spaces with topological structure, but may not be finite-dimensional as vector spaces. Thus, $\displaystyle{ \Theta \subseteq V }$ for some possibly infinite-dimensional space $\displaystyle{ V }$.
• With a semiparametric model, the parameter has both a finite-dimensional component and an infinite-dimensional component (often a real-valued function defined on the real line). Thus, $\displaystyle{ \Theta \subseteq \mathbb{R}^k \times V }$, where $\displaystyle{ V }$ is an infinite-dimensional space.

It may appear at first that semiparametric models include nonparametric models, since they have an infinite-dimensional as well as a finite-dimensional component. However, a semiparametric model is considered to be "smaller" than a completely nonparametric model because we are often interested only in the finite-dimensional component of $\displaystyle{ \theta }$. That is, the infinite-dimensional component is regarded as a nuisance parameter. In nonparametric models, by contrast, the primary interest is in estimating the infinite-dimensional parameter. Thus the estimation task is statistically harder in nonparametric models.

These models often use smoothing or kernels.

## Example

A well-known example of a semiparametric model is the Cox proportional hazards model. If we are interested in studying the time $\displaystyle{ T }$ to an event such as death due to cancer or failure of a light bulb, the Cox model specifies the following distribution function for $\displaystyle{ T }$:

$\displaystyle{ F(t) = 1 - \exp\left(-\int_0^t \lambda_0(u) e^{\beta x} du\right), }$

where $\displaystyle{ x }$ is the covariate vector, and $\displaystyle{ \beta }$ and $\displaystyle{ \lambda_0(u) }$ are unknown parameters. $\displaystyle{ \theta = (\beta, \lambda_0(u)) }$. Here $\displaystyle{ \beta }$ is finite-dimensional and is of interest; $\displaystyle{ \lambda_0(u) }$ is an unknown non-negative function of time (known as the baseline hazard function) and is often a nuisance parameter. The set of possible candidates for $\displaystyle{ \lambda_0(u) }$ is infinite-dimensional.