Grouped Dirichlet distribution

Short description: Probability distribution

In statistics, the grouped Dirichlet distribution (GDD) is a multivariate generalization of the Dirichlet distribution It was first described by Ng et al. 2008.^[1] The Grouped Dirichlet distribution arises in the analysis of categorical data where some observations could fall into any of a set of other 'crisp' category. For example, one may have a data set consisting of cases and controls under two different conditions. With complete data, the cross-classification of disease status forms a 2(case/control)-x-(condition/no-condition) table with cell probabilities

	Treatment	No Treatment
Controls	θ₁	θ₂
Cases	θ₃	θ₄

If, however, the data includes, say, non-respondents which are known to be controls or cases, then the cross-classification of disease status forms a 2-x-3 table. The probability of the last column is the sum of the probabilities of the first two columns in each row, e.g.

	Treatment	No Treatment	Missing
Controls	θ₁	θ₂	θ₁+θ₂
Cases	θ₃	θ₄	θ₃+θ₄

The GDD allows the full estimation of the cell probabilities under such aggregation conditions.^[1]

Probability Distribution

Consider the closed simplex set $𝒯_{n} = {(x_{1}, \dots x_{n}) | x_{i} \geq 0, i = 1, \dots, n, \sum_{i = 1}^{n} x_{n} = 1}$ and $𝐱 \in 𝒯_{n}$ . Writing $𝐱_{- n} = (x_{1}, \dots, x_{n - 1})$ for the first $n - 1$ elements of a member of $𝒯_{n}$ , the distribution of $𝐱$ for two partitions has a density function given by

{GD}_{n, 2, s} (𝐱_{- n} | 𝐚, 𝐛) = \frac{(\prod_{i = 1}^{n} x_{i}^{a_{i} - 1}) \cdot {(\sum_{i = 1}^{s} x_{i})}^{b_{1}} \cdot {(\sum_{i = s + 1}^{n} x_{i})}^{b_{2}}}{B (a_{1}, \dots, a_{s}) \cdot B (a_{s + 1}, \dots, a_{n}) \cdot B (b_{1} + \sum_{i = 1}^{s} a_{i}, b_{2} + \sum_{i = s + 1}^{n} a_{i})}

where $B (𝐚)$ is the Multivariate beta function.

Ng et al.^[1] went on to define an m partition grouped Dirichlet distribution with density of $𝐱_{- n}$ given by

{GD}_{n, m, 𝐬} (𝐱_{- n} | 𝐚, 𝐛) = c_{m}^{- 1} \cdot (\prod_{i = 1}^{n} x_{i}^{a_{i} - 1}) \cdot \prod_{j = 1}^{m} {(\sum_{k = s_{j - 1} + 1}^{s_{j}} x_{k})}^{b_{j}}

where $𝐬 = (s_{1}, \dots, s_{m})$ is a vector of integers with $0 = s_{0} < s_{1} ⩽ \dots ⩽ s_{m} = n$ . The normalizing constant given by

c_{m} = {\prod_{j = 1}^{m} B (a_{s_{j - 1} + 1}, \dots, a_{s_{j}})} \cdot B (b_{1} + \sum_{k = 1}^{s_{1}} a_{k}, \dots, b_{m} + \sum_{k = s_{m - 1} + 1}^{s_{m}} a_{k})

The authors went on to use these distributions in the context of three different applications in medical science.

References

↑ ^1.0 ^1.1 ^1.2 Ng, Kai Wang (2008). "Grouped Dirichlet distribution: A new tool for incomplete categorical data analysis". Journal of Multivariate Analysis 99: 490–509.

0.00

(0 votes)

Original source: https://en.wikipedia.org/wiki/Grouped Dirichlet distribution. Read more

[ng2008-1] 1.0 ^1.1 ^1.2 Ng, Kai Wang (2008). "Grouped Dirichlet distribution: A new tool for incomplete categorical data analysis". Journal of Multivariate Analysis 99: 490–509.

[1]

Anonymous

Search

Grouped Dirichlet distribution

Namespaces

More

Page actions

Probability Distribution

References

Navigation

Navigation

Resources

Help

googletranslator

Navigation

Wiki tools

Wiki tools

Anonymous

Search

Grouped Dirichlet distribution

Probability Distribution

References

Navigation

Wiki tools

Page tools

Other projects

Categories