Modeling Bi-Modal Count Data Using COM-Poisson Mixtures
| Title | Modeling Bi-Modal Count Data Using COM-Poisson Mixtures |
| Publication Type | Conference Proceedings |
| Year of Publication | 2012 |
| Authors | Sur, P., P. Dubey, S. Bose, and G. Shmueli |
| Conference Name | Applied Statistics |
| Conference Start Date | September 2012 |
| Conference Location | Ljubljana, Slovenia |
| Abstract | Bi-modal truncated count distributions are frequently observed in aggregate surveys and ratings when respondents are mixed in their opinion. They also arise in censored count data, where the highest category might create an additional mode. Modeling bi-modal behaviour in count data is useful for various purposes, from comparing shapes of different samples (or questions) to predicting future ratings by new raters. The Poisson distribution is the most common distribution for fitting count data and can be modified to achieve mixtures of truncated Poisson distributions. However, it is suitable only for modeling equi-dispersed distributions. Real life data often exhibit over- or under-dispersion. In such cases, the Poisson distribution typically does not provide good approximations. Also, the Poisson distribution and even Poisson mixtures are limited in their ability to capture bi-modality. A more flexible alternative is the Conway-Maxwell-Poisson (CMP) distribution. The CMP is a two-parameter generalization of the Poisson distribution that allows for over- and under-dispersion and also includes the Bernoulli and geometric distributions as special cases. Despite being much more flexible, the CMP still cannot capture bi-modality. In this paper, we propose a mixture of CMPs for capturing a wide range of truncated count data, which can exhibit unimodal behaviour as well as bimodal behaviour (with individual components exhibiting equi-, under- or over- dispersion). We present methods for estimating the parameters of a mixture of two CMP distributions using an Expectation-Maximization(EM) approach. Our algorithm introduces a special two-step optimization within the M step to estimate multiple parameters. We examine computational and theoretical issues. The methods are illustrated using simulated data and real data arising from three common scenarios where mixtures of Poisson fail to adequately fit the data. We also compare the performance of a Poisson mixture and a CMP mixture in each case to validate the advantage of CMP mixture modelling. |