Modeling Bi-Modal Count Data Using COM-Poisson Mixtures

TitleModeling Bi-Modal Count Data Using COM-Poisson Mixtures
Publication TypeConference Proceedings
Year of Publication2012
AuthorsSur, P., P. Dubey, S. Bose, and G. Shmueli
Conference NameApplied Statistics
Conference Start DateSeptember 2012
Conference LocationLjubljana, Slovenia
Abstract

Bi-modal truncated count distributions are frequently observed in aggregate surveys and ratings when respondents are mixed in their opinion. They also arise in censored count data, where the highest category might create an additional mode. Modeling bi-modal behaviour in count data is useful for various purposes, from comparing shapes of different samples (or questions) to predicting future ratings by new raters. The Poisson distribution is the most common distribution for fitting count data and can be modified to achieve mixtures of truncated Poisson distributions. However, it is suitable only for modeling equi-dispersed distributions. Real life data often exhibit over- or under-dispersion. In such cases, the Poisson distribution typically does not provide good approximations. Also, the Poisson distribution and even Poisson mixtures are limited in their ability to capture bi-modality. A more flexible alternative is the Conway-Maxwell-Poisson (CMP) distribution. The CMP is a two-parameter generalization of the Poisson distribution that allows for over- and under-dispersion and also includes the Bernoulli and geometric distributions as special cases. Despite being much more flexible, the CMP still cannot capture bi-modality. In this paper, we propose a mixture of CMPs for capturing a wide range of truncated count data, which can exhibit unimodal behaviour as well as bimodal behaviour (with individual components exhibiting equi-, under- or over- dispersion). We present methods for estimating the parameters of a mixture of two CMP distributions using an Expectation-Maximization(EM) approach. Our algorithm introduces a special two-step optimization within the M step to estimate multiple parameters. We examine computational and theoretical issues. The methods are illustrated using simulated data and real data arising from three common scenarios where mixtures of Poisson fail to adequately fit the data. We also compare the performance of a Poisson mixture and a CMP mixture in each case to validate the advantage of CMP mixture modelling.

Contact

Galit Shmuéli
SRITNE Chaired Professor
of Data Analytics
Associate Professor of Statistics & Information Systems
Indian School of Business
Gachibowli, Hyderabad 500 032
India

galit.shmueli@gmail.com