- Please, #microbiome and #sequencing data are NOT zero-inflated. Let's stop repeating this nonsense. Zero-inflated compared to what?? Those zeroes carry important information about abundance and sequencing depth, and are not "inflated" in any sense. 1/6
- In fact, if you look at blanks and other control data, you see a lot of incorrect detections. There's better evidence that microbiome data is NON-ZERO inflated than zero-inflated. 2/6
- Don't get me started on overdispersed, let alone compositional. Microbiome data is none of these, and I'm not new to this field. 3/6
- Why does this matter? This sort of thinking leads biologists to trust estimators based on highly-parametrised parametric models that are (1) surely misspecified and (2) have terrible properties under misspecification. 4/6
- Saying "microbiome data is zero-inflated" leads people to seek out "zero-inflated models." Usually, these are bad methods with bad properties. Stay away. 5/6
- I leave you with the StatDivLab mantra: 1. choose something meaningful to estimate 2. choose a sensible way to estimate it 3. choose tests that control Type 1 error That's what we will keep doing, even if anonymous reviewers insist on buzzwords. 6/6
- I hope you're all having a better week than me. 😽😽 7/6
- How do I describe data with a lot of zeroes? Sparse. How do I describe data with a lot of variance? High-variance. How do I describe data where the totals convey complex information about an unknown quantity I care about? (abundance) I don't. I just state my assumptions.
May 6, 2025 19:25