Content area
Abstract
The problem of ecological inference arises when drawing conclusions about individual behavior from aggregate data. Such a situation is frequently encountered in the social sciences and epidemiology. In this dissertation, we propose an incomplete data framework for ecological inference. We also present various models for ecological inference problem in 2 x 2 tables using the data augmentation approach. Followed by a brief introduction, this dissertation consists of three interrelated chapters on the incomplete data approach to the ecological inference problem.
In Chapter 2, we formulate the ecological inference problem in 2 x 2 tables as an incomplete data problem where there is no contextual effect. This framework directly incorporates the deterministic bounds, which contain all the information available from the data. A parametric model is developed that can incorporate covariates and individual-level data. Then I propose a nonparametric model using a Dirichlet process prior that avoids distributional assumptions. This model relaxes the arbitrary distributional assumption. Finally, through simulations and an empirical application, we evaluate the performance of these models in comparison with existing methods.
In Chapter 3; we formally define the ecological inference problem as a coarse data problem. The related assumptions and theoretical results are applied to the ecological inference problem. In particular, one can identify three key issues affecting ecological Inference under this framework-distributional, contextual and aggregation effects. Through the use of an EM algorithm and its extension, the model can formally quantify the effect of missing information due to aggregation. Then I extend the models proposed in the first paper to incorporate conditions when the data are not coarsened at random. By controlling the coarsening process, one can make valid ecological inference. The chapter concludes with simulations and empirical applications that assess the model performance.
Finally, in Chapter 4, we discuss the computational details for fitting the models that are introduced in previous chapters. We use Markov Chain Monte Carlo algorithms to estimate the Bayesian models. In particular, we illustrate the Gibbs samplers that are used for posterior simulation and inference for various Bayesian models. In addition, we develop the EM and SEM algorithms to compute the maximum likelihood estimates. In the end, we present a publicly available R package,