Grouped Dirichlet distribution
In statistics, the grouped Dirichlet distribution (GDD) is a multivariate generalization of the Dirichlet distribution It was first described by Ng et al. 2008.[1] The Grouped Dirichlet distribution arises in the analysis of categorical data where some observations could fall into any of a set of other 'crisp' category. For example, one may have a data set consisting of cases and controls under two different conditions. With complete data, the cross-classification of disease status forms a 2(case/control)-x-(condition/no-condition) table with cell probabilities
Treatment | No Treatment | |
Controls | θ1 | θ2 |
Cases | θ3 | θ4 |
If, however, the data includes, say, non-respondents which are known to be controls or cases, then the cross-classification of disease status forms a 2-x-3 table. The probability of the last column is the sum of the probabilities of the first two columns in each row, e.g.
Treatment | No Treatment | Missing | |
Controls | θ1 | θ2 | θ1+θ2 |
Cases | θ3 | θ4 | θ3+θ4 |
The GDD allows the full estimation of the cell probabilities under such aggregation conditions.[1]
Probability Distribution
Consider the closed simplex set [math]\displaystyle{ \mathcal{T}_n=\left\{\left(x_1,\ldots x_n\right)\left|x_i\geq 0, i=1,\cdots,n, \sum_{i=1}^n x_n =1\right.\right\} }[/math] and [math]\displaystyle{ \mathbf{x}\in\mathcal{T}_n }[/math]. Writing [math]\displaystyle{ \mathbf{x}_{-n}=\left(x_1,\ldots,x_{n-1}\right) }[/math] for the first [math]\displaystyle{ n-1 }[/math] elements of a member of [math]\displaystyle{ \mathcal{T}_n }[/math], the distribution of [math]\displaystyle{ \mathbf{x} }[/math] for two partitions has a density function given by
- [math]\displaystyle{ \operatorname{GD}_{n,2,s}\left(\left.\mathbf{x}_{-n}\right|\mathbf{a},\mathbf{b}\right)= \frac{ \left(\prod_{i=1} ^n x_i^{a_i-1}\right)\cdot \left(\sum_{i=1} ^s x_i \right)^{b_1}\cdot \left(\sum_{i=s+1} ^n x_i \right)^{b_2} }{ \operatorname{\Beta}\left(a_1,\ldots,a_s\right)\cdot \operatorname{\Beta}\left(a_{s+1},\ldots,a_n\right)\cdot \operatorname{\Beta}\left(b_1+\sum_{i=1}^sa_i,b_2+\sum_{i=s+1}^n a_i\right) } }[/math]
where [math]\displaystyle{ \operatorname{\Beta}\left(\mathbf{a}\right) }[/math] is the Multivariate beta function.
Ng et al.[1] went on to define an m partition grouped Dirichlet distribution with density of [math]\displaystyle{ \mathbf{x}_{-n} }[/math] given by
- [math]\displaystyle{ \operatorname{GD}_{n,m,\mathbf{s}}\left(\left.\mathbf{x}_{-n}\right|\mathbf{a},\mathbf{b}\right) = c_m^{-1}\cdot \left(\prod_{i=1}^n x_i^{a_i-1}\right)\cdot \prod_{j=1}^m\left(\sum_{k=s_{j-1}+1}^{s_j}x_k\right)^{b_j} }[/math]
where [math]\displaystyle{ \mathbf{s} = \left(s_1,\ldots,s_m\right) }[/math] is a vector of integers with [math]\displaystyle{ 0=s_0\lt s_1\leqslant\cdots\leqslant s_m=n }[/math]. The normalizing constant given by
- [math]\displaystyle{ c_m=\left\{\prod_{j=1}^m\operatorname{\Beta}\left(a_{s_{j-1}+1},\ldots,a_{s_j}\right)\right\}\cdot \operatorname{\Beta}\left(b_1+\sum_{k=1}^{s_1}a_k,\ldots,b_m+\sum_{k=s_{m-1}+1}^{s_m}a_k\right) }[/math]
The authors went on to use these distributions in the context of three different applications in medical science.
References
Original source: https://en.wikipedia.org/wiki/Grouped Dirichlet distribution.
Read more |