Using a three step algorithm to generate overdispersed binomial outcome data. When the number of frequencies, binomial random variable, probability of success and overdispersion are given.
Arguments
- N
single value for number of total frequencies
- n
single value for binomial random variable
- pi
single value for probability of success
- rho
single value for overdispersion parameter
Details
The generated binomial random variables are overdispersed based on \(rho\) for the probability of success \(pi\).
Step 1: Solve the following equation for a given \(n,pi,rho\), $$phi(z(pi),z(pi),delta)=pi(1-pi)rho + pi^2,$$
For \(delta\) where \(phi(z(pi),z(pi),delta)\) is the cumulative distribution function of the standard bivariate normal random variable with correlation coefficient \(delta\), and \(z(pi)\) denotes the \(pi^{th}\) quantile of the standard normal distribution.
Step 2: Generate $n$-dimensional multivariate normal random variables, \(Z_i=(Z_{i1},Z_{i2},ldots,Z_{in})^T\) with mean \(0\) and constant correlation matrix \(Sigma_i\) for \(i=1,2,\ldots,N,\) where the elements of \((Sigma_i)_{lm}\) are \(delta\) for \(l \ne m\).
Step 3: Now for each \(j=1,2,\ldots,n\) define \(X_{ij} = 1;\) if \(Z_{ij} < z(\pi)\), or \(X_{ij} = 0;\) otherwise. Then, it can be showed that the random variable \(Y_i=\sum_{j=1}^{n} X_{ij}\) is overdispersed relative to the Binomial distribution.
NOTE : If input parameters are not in given domain conditions necessary error messages will be provided to go further.
References
Manoj C, Wijekoon P, Yapa RD (2013). “The McDonald generalized beta-binomial distribution: A new binomial mixture distribution and simulation based comparison with its nested distributions in handling overdispersion.” International journal of statistics and probability, 2(2), 24.
Examples
N <- 500 # Number of observations
n <- 10 # Dimension of multivariate normal random variables
pi <- 0.5 # Probability threshold
rho <- 0.1 # Dispersion parameter
# Generate overdispersed binomial variables
New_overdispersed_data <- GenerateBOD(N, n, pi, rho)
table(New_overdispersed_data)
#> New_overdispersed_data
#> 0 1 2 3 4 5 6 7 8 9 10
#> 6 18 47 62 68 95 77 67 32 21 7