Skip to main content

Data Say Nothing at All: A Plea for Honesty in the Packaging of Statistics

Scientists everywhere have been seduced by statistical training into interpreting P-values, statistical significance, and confidence intervals as if those statistics are "what the data say" about associations and effects, and have been encouraged to believe that simple statistical methods "let the data speak for themselves." Both beliefs are fallacious. Without exception, all conventional statistics are computed from a model for data probabilities, and so are what a model says about the data. To claim otherwise involves two logical errors. One error is the confusion of probabilities of data given hypotheses with probabilities of hypotheses given data; the latter are computed only in Bayesian analyses. This error is equivalent to confusing the sensitivity of a diagnostic test with the test's predictive value, and is widely discussed. The second error is acknowledged only rarely: Any simple method must be based on a simple ("parsimonious") model, which imposes more restrictions than a complex regression generalization of the model. As a consequence, a simple method has more capacity to misrepresent data patterns than its generalization; indeed, data misrepresentation is one major complaint against statistical hypothesis testing. These logical problems may be avoided by integrating data explanation into data analysis, provided one learns to distinguish data from statistics.