The excess-zero problem in soil animal count data and choice of appropriate models for statistical inference uri icon

abstract

  • Recent studies show that soil animal count data are characterized by the presence of excess zeros and overdispersion, which violate the assumptions of standard statistical tests. Despite this, analyses have consisted of mainly non-parametric tests and log-normal least square regression (i.e. ANOVA). Failure to accommodate zero inflation in count data can result in biased estimation of ecological effects jeopardizing the integrity of the scientific inference. The objective of this study was to compare statistical models for the analysis of soil animal count data and suggest appropriate methods for estimating abundance. The log-normal regression model, linear mixed model (LMM), standard Poisson, Poisson with correction for overdispersion (PCO), negative binomial distribution (NBD), the zero-inflated Poisson (ZIP) and zero-inflated negative binomial (ZINB) models were compared using 12 count data sets of earthworms, millipedes, centipedes, beetles, ants and termites from soils under the miombo woodland and agroforestry systems in eastern Zambia. The NBD with covariates gave a better description of the data in nine out of 12 cases than did the standard Poisson, ZIP and ZINB. The ZIP and ZINB models with covariates gave the best description of earthworm counts from the miombo and millipede counts from agroforestry, respectively. In all cases, the ZIP model was better than the standard Poisson model. The ZINB was inferior to the NBD except for earthworm counts from the miombo and millipede counts in agroforestry. Significance tests based on the PCO, ZIP, NBD and ZINB were more conservative than those based on the standard Poisson model. The 95% confidence intervals computed using the PCO, ZIP, NBD and ZINB were also wider than those computed using least squares, LMM and assuming Poisson distribution. It is concluded that for the comparison among habitat types, land-use categories or treatments, the NBD, ZIP and ZINB perform better than the log-normal and Poisson models. Considering the excess-zero problem and significant deviation of soil animal counts from the assumptions of normality and homoscedcity, the log-normal regression model is inappropriate. Therefore, routine application of the log-normal regression model and non-parametric tests for analysis of soil animal count data with many zeros should be discouraged
  • Recent studies show that soil animal count data are characterized by the presence of excess zeros and overdispersion, which violate the assumptions of standard statistical. tests. Despite this, analyses have consisted of mainly non-parametric tests and log-normal least square regression (i.e. ANOVA). Failure to accommodate zero inflation in count data can result in biased estimation of ecological. effects jeopardizing the integrity of the scientific inference. The objective of this study was to compare statistical models for the analysis of soil animal count data and suggest appropriate methods for estimating abundance. The log-normal regression model., linear mixed model. (LMM), standard Poisson, Poisson with correction for overdispersion (PCO), negative binomial distribution (NBD), the zero-inflated Poisson (ZIP) and zero-inflated negative binomial, (ZINB) models were compared using 12 count data sets of earthworms, millipedes, centipedes, beetles, ants and termites from soils under the miombo woodland and agroforestry systems in eastern Zambia. The NBD with covariates gave a better description of the data in nine out of 12 cases than did the standard Poisson, ZIP and ZINB. The ZIP and ZINB models with covariates gave the best description of earthworm counts from the miombo and millipede counts from agroforestry, respectively. In all cases, the ZIP model. was better than the standard Poisson model. The ZINB was inferior to the NBD except for earthworm counts from the miombo and millipede counts in agroforestry. Significance tests based on the PCO, ZIP, NBD and ZINB were more conservative than those based on the standard Poisson model. The 95% confidence intervals computed using the PCO, ZIP, NBD and ZINB were also wider than those computed using least squares, LMM and assuming Poisson distribution. It is concluded that for the comparison among habitat types, land-use categories or treatments, the NBD, ZIP and ZINB perform better than the log-normal and Poisson models. Considering the excess-zero problem and

publication date

  • 2008
  • 2008
  • 2008