ecoNP {eco} | R Documentation |
ecoNP
is used to fit the nonparametric Bayesian model (based on
a Dirichlet process prior) for ecological inference in 2 times
2 tables via Markov chain Monte Carlo. It gives the in-sample
predictions as well as out-of-sample predictions for population
inference. The models and algorithms are described in Imai, Lu and
Strauss (2008, Forthcoming).
ecoNP(formula, data = parent.frame(), N = NULL, supplement = NULL, context = FALSE, mu0 = 0, tau0 = 2, nu0 = 4, S0 = 10, alpha = NULL, a0 = 1, b0 = 0.1, parameter = FALSE, grid = FALSE, n.draws = 5000, burnin = 0, thin = 0, verbose = FALSE)
formula |
A symbolic description of the model to be fit,
specifying the column and row margins of 2 times
2 ecological tables. Y ~ X specifies Y as the
column margin (e.g., turnout) and X as the row margin
(e.g., percent African-American). Details and specific examples
are given below.
|
data |
An optional data frame in which to interpret the variables
in formula . The default is the environment in which
ecoNP is called.
|
N |
An optional variable representing the size of the unit; e.g.,
the total number of voters. N needs to be a vector of same length
as Y and X or a scalar. |
supplement |
An optional matrix of supplemental data. The matrix
has two columns, which contain additional individual-level data such
as survey data for W_1 and W_2, respectively. If
NULL , no additional individual-level data are included in the
model. The default is NULL .
|
context |
Logical. If TRUE , the contextual effect is also
modeled, that is to assume the row margin X and the unknown
W_1 and W_2 are correlated. See Imai, Lu and Strauss
(2008, Forthcoming) for details. The default is FALSE .
|
mu0 |
A scalar or a numeric vector that specifies the prior mean
for the mean parameter μ of the base prior distribution G_0
(see Imai, Lu and Strauss (2008, Forthcoming) for detailed
descriptions of Dirichlete prior and the normal base prior distribution) .
If it is a scalar, then its value will be repeated to yield a vector
of the length of μ, otherwise,
it needs to be a vector of same length as μ.
When context=TRUE , the length of μ is 3,
otherwise it is 2. The default is 0 .
|
tau0 |
A positive integer representing the scale parameter of the
Normal-Inverse Wishart prior for the mean and variance parameter
(μ_i, Σ_i) of each observation. The default is 2 . |
nu0 |
A positive integer representing the prior degrees of
freedom of the variance matrix Σ_i. the default is 4 .
|
S0 |
A positive scalar or a positive definite matrix that specifies
the prior scale matrix for the variance matrix Σ_i. If it is
a scalar, then the prior scale matrix will be a diagonal matrix with
the same dimensions as Σ_i and the diagonal elements all
take value of S0 , otherwise S0 needs to have same
dimensions as Σ_i. When context=TRUE , Σ is a
3 times 3 matrix, otherwise, it is 2 times 2.
The default is 10 .
|
alpha |
A positive scalar representing a user-specified fixed
value of the concentration parameter, α. If NULL ,
α will be updated at each Gibbs draw, and its prior
parameters a0 and b0 need to be specified. The default
is NULL .
|
a0 |
A positive integer representing the value of shape parameter
of the gamma prior distribution for α. The default is 1 .
|
b0 |
A positive integer representing the value of the scale
parameter of the gamma prior distribution for α. The
default is 0.1 .
|
parameter |
Logical. If TRUE , the Gibbs draws of the population
parameters, μ and Σ, are returned in addition to
the in-sample predictions of the missing internal cells,
W. The default is FALSE . This needs to be set to
TRUE if one wishes to make population inferences through
predict.eco . See an example below.
|
grid |
Logical. If TRUE , the grid method is used to sample
W in the Gibbs sampler. If FALSE , the Metropolis
algorithm is used where candidate draws are sampled from the uniform
distribution on the tomography line for each unit. Note that the
grid method is significantly slower than the Metropolis algorithm.
|
n.draws |
A positive integer. The number of MCMC draws.
The default is 5000 .
|
burnin |
A positive integer. The burnin interval for the Markov
chain; i.e. the number of initial draws that should not be stored. The
default is 0 .
|
thin |
A positive integer. The thinning interval for the
Markov chain; i.e. the number of Gibbs draws between the recorded
values that are skipped. The default is 0 .
|
verbose |
Logical. If TRUE , the progress of the Gibbs
sampler is printed to the screen. The default is FALSE .
|
An object of class ecoNP
containing the following elements:
call |
The matched call. |
X |
The row margin, X. |
Y |
The column margin, Y. |
burnin |
The number of initial burnin draws. |
thin |
The thinning interval. |
nu0 |
The prior degrees of freedom. |
tau0 |
The prior scale parameter. |
mu0 |
The prior mean. |
S0 |
The prior scale matrix. |
a0 |
The prior shape parameter. |
b0 |
The prior scale parameter. |
W |
A three dimensional array storing the posterior in-sample predictions of W. The first dimension indexes the Monte Carlo draws, the second dimension indexes the columns of the table, and the third dimension represents the observations. |
Wmin |
A numeric matrix storing the lower bounds of W. |
Wmax |
A numeric matrix storing the upper bounds of W. |
mu |
A three dimensional array storing the posterior draws of the population mean parameter, μ. The first dimension indexes the Monte Carlo draws, the second dimension indexes the columns of the table, and the third dimension represents the observations. |
Sigma |
A three dimensional array storing the posterior draws of the population variance matrix, Σ. The first dimension indexes the Monte Carlo draws, the second dimension indexes the parameters, and the third dimension represents the observations. |
alpha |
The posterior draws of α. |
nstar |
The number of clusters at each Gibbs draw. |
Kosuke Imai, Department of Politics, Princeton University, kimai@Princeton.Edu, http://imai.princeton.edu; Ying Lu, Department of Humanities and Social Sciences in the Professions, Steinhardt School of Culture, Education and Human Development, New York University, yl46@Nyu.Edu
Imai, Kosuke, Ying Lu and Aaron Strauss. (Forthcoming). “eco: R Package for Ecological Inference in 2x2 Tables” Journal of Statistical Software, available at http://imai.princeton.edu/research/eco.html
Imai, Kosuke, Ying Lu and Aaron Strauss. (2008). “Bayesian and Likelihood Inference for 2 x 2 Ecological Tables: An Incomplete Data Approach” Political Analysis, Vol. 16, No. 1 (Winter), pp. 41-69. available at http://imai.princeton.edu/research/eiall.html
eco
, ecoML
, predict.eco
, summary.ecoNP
## load the registration data data(reg) ## NOTE: We set the number of MCMC draws to be a very small number in ## the following examples; i.e., convergence has not been properly ## assessed. See Imai, Lu and Strauss (2006) for more complete examples. ## fit the nonparametric model to give in-sample predictions ## store the parameters to make population inference later res <- ecoNP(Y ~ X, data = reg, n.draws = 50, param = TRUE, verbose = TRUE) ##summarize the results summary(res) ## obtain out-of-sample prediction out <- predict(res, verbose = TRUE) ## summarize the results summary(out) ## density plots of the out-of-sample predictions par(mfrow=c(2,1)) plot(density(out[,1]), main = "W1") plot(density(out[,2]), main = "W2") ## load the Robinson's census data data(census) ## fit the parametric model with contextual effects and N ## using the default prior specification res1 <- ecoNP(Y ~ X, N = N, context = TRUE, param = TRUE, data = census, n.draws = 25, verbose = TRUE) ## summarize the results summary(res1) ## out-of sample prediction pres1 <- predict(res1) summary(pres1)