Confidence Intervals and Smallest Worthwhile Change Are Not a Panacea A Response to the International Society of Physiotherapy Journal Editors

Main Article Content

Matthew S. Tenan
https://orcid.org/0000-0003-2215-0846
Aaron R. Caldwell
https://orcid.org/0000-0002-4541-6283

Abstract

Recently, a group of editors from physiotherapy journals wrote a joint editorial on the use of statistics in their journals. Like many editorials before them, the editors, who were not statistical experts themselves, put forth numerous recommendations to physiotherapy researchers on how to analyze and report their statistical analyses. This editorial unfortunately suffers from numerous mischaracterizations or outright falsehoods regarding statistics. After a thorough review, two major issues appear throughout the editorial. First, the editors incorrectly state that the use of confidence intervals (CI) would alleviate some of the issues with significance testing. Second, the editors incorrectly assume “smallest worthwhile change” statistics are immutable facts related to some ground truth of treatment effects. In this critical review, we briefly outline some of the problematic statements made by the editors, point out why it is too premature to adopt an estimation approach relying on a minimal clinically relevant difference, and offer some simple alternatives that we believe are statistically sound and easy for the average physiotherapy researcher to implement.

Article Details

How to Cite
Tenan, M., & Caldwell, A. . (2022). Confidence Intervals and Smallest Worthwhile Change Are Not a Panacea: A Response to the International Society of Physiotherapy Journal Editors. Communications in Kinesiology, 1(4). https://doi.org/10.51224/cik.2022.45 (Original work published September 9, 2022)
Section
Metascience

References

Altman, D. G. (2003). Statistics notes: Interaction revisited: The difference between two estimates. BMJ, 326(7382), 219–219. https://doi.org/10.1136/bmj.326.7382.219

Altman, D. G., & Bland, J. M. (2011). How to obtain the confidence interval from a p value. BMJ, 343(aug08 1), 2090–2090. https://doi.org/10.1136/bmj.d2090

Boos, D. D., & Stefanski, L. A. (2011). P-value precision and reproducibility. The American Statistician, 65(4), 213–221. https://doi.org/10.1198/tas.2011.10129

Caldwell, A. R., & Cheuvront, S. N. (2019). Basic statistical considerations for physiology: The journal temperature toolbox. Temperature, 6(3), 181–210. https://doi.org/10.1080/23328940.2019.1624131

Campbell, H., & Gustafson, P. (2018). Conditional equivalence testing: An alternative remedy for publication bias. PLOS ONE, 13(4), 0195145. https://doi.org/10.1371/journal.pone.0195145

Deyle, G. D., Allen, C. S., Allison, S. C., Gill, N. W., Hando, B. R., Petersen, E. J., Dusenberry, D. I., & Rhon, D. I. (2020). Physical therapy versus glucocorticoid injection for osteoarthritis of the knee. New England Journal of Medicine, 382(15), 1420–1429. https://doi.org/10.1056/NEJMoa1905877

Elkins, M. R., Pinto, R. Z., Verhagen, A., Grygorowicz, M., Soderlund, A., Guemann, M., Gomez-Conesa, A., Blanton, S., Brismee, J. M., Agarwal, S., Jette, A., Harms, M., Verheyden, G., & Sheikh, U. (2022). Correspondence: Response to lakens. Journal of Physiotherapy, 68(3), 214. https://doi.org/10.1016/j.jphys.2022.06.003

Elkins, M. R., Pinto, R. Z., Verhagen, A., Grygorowicz, M., Soderlund, A., Guemann, M., Gomez-Conesa, A., Blanton, S., Brismee, J. M., Ardern, C., Agarwal, S., Jette, A., Karstens, S., Harms, M., Verheyden, G., & Sheikh, U. (2022). Statistical inference through estimation: Recommendations from the international society of physiotherapy journal editors. Journal of Physiotherapy, 68(1), 1–4. https://doi.org/10.1016/j.jphys.2021.12.001

Ferreira, M. (2018). Research note: The smallest worthwhile effect of a health intervention. Journal of Physiotherapy, 64(4), 272–274. https://doi.org/10.1016/j.jphys.2018.07.008

Higgins, J. P., Li, T., & Deeks, J. J. (2019). Choosing effect measures and computing estimates of effect. Cochrane Handbook for Systematic Reviews of Interventions, 143–176. https://training.cochrane.org/handbook/current/chapter-06

Hoekstra, R., Morey, R. D., Rouder, J. N., & Wagenmakers, E.-J. (2014). Robust misinterpretation of confidence intervals. Psychonomic Bulletin & Review, 21(5), 1157–1164. https://doi.org/10.3758/s13423-013-0572-3

Hopewell, S., Loudon, K., Clarke, M. J., Oxman, A. D., & Dickersin, K. (2009). Publication bias in clinical trials due to statistical significance or direction of trial results. Cochrane Database of Systematic Reviews, 1. https://doi.org/10.1002/14651858.MR000006.pub3

Lakens, D. (2022). Correspondence: Reward, but do not yet require, interval hypothesis tests. Journal of Physiotherapy, 68(3), 213–214. https://doi.org/10.1016/j.jphys.2022.06.004

Lakens, D., Scheel, A. M., & Isager, P. M. (2018). Equivalence testing for psychological research: A tutorial. Advances in Methods and Practices in Psychological Science, 1(2), 259–269. https://doi.org/10.1177/2515245918770963

Lindley, D. V. (1957). A statistical paradox. Biometrika, 44(1-2), 187–192. https://doi.org/10.1093/biomet/44.1-2.187

Lohse, K. R., Sainani, K. L., Taylor, J. A., Butson, M. L., Knight, E. J., & Vickers, A. J. (2020). Systematic review of the use of “magnitude-based inference” in sports science and medicine. PLOS ONE, 15(6), e0235318. https://doi.org/10.1371/journal.pone.0235318

Maier, M., & Lakens, D. (2022). Justify your alpha: A primer on two practical approaches. Advances in Methods and Practices in Psychological Science, 5(2), 251524592210803. https://doi.org/10.1177/25152459221080396

Mayo, D. G. (2021). The statistics wars and intellectual conflicts of interest. Conservation Biology. https://doi.org/10.1111/cobi.13861

Mazzolari, R., Porcelli, S., Bishop, D. J., & Lakens, D. (2022). Myths and methodologies: The use of equivalence and non-inferiority tests for interventional studies in exercise physiology and sport science [Experimental Physiology.]. https://doi.org/10.1113/ep090171

Morey, R. D., Hoekstra, R., Rouder, J. N., Lee, M. D., & Wagenmakers, E.-J. (2015). The fallacy of placing confidence in confidence intervals. Psychonomic Bulletin & Review, 23(1), 103–123. https://doi.org/10.3758/s13423-015-0947-8

Rafi, Z., & Greenland, S. (2020). Semantic and cognitive tools to aid statistical science: Replace confidence and significance by compatibility and surprise. BMC Medical Research Methodology, 20(1). https://doi.org/10.1186/s12874-020-01105-9

Rouder, J. N., Speckman, P. L., Sun, D., Morey, R. D., & Iverson, G. (2009). Bayesian t tests for accepting and rejecting the null hypothesis. Psychonomic Bulletin & Review, 16(2), 225–237.

Sainani, K. L., Borg, D. N., Caldwell, A. R., Butson, M. L., Tenan, M. S., Vickers, A. J., Vigotsky, A. D., Warmenhoven, J., Nguyen, R., Lohse, K. R., Knight, E. J., & Bargary, N. (2020). Call to increase statistical collaboration in sports science, sport and exercise medicine and sports physiotherapy. British Journal of Sports Medicine, 55(2), 118–122. https://doi.org/10.1136/bjsports-2020-102607

Senn, S. (2005). Dichotomania: An obsessive compulsive disorder that is badly affecting the quality of analysis of pharmaceutical trials. Proceedings of the International Statistical Institute, 55th Session. https://www.isi-web.org/isi.cbs.nl/iamamember/CD6-Sydney2005/ISI2005_Papers/398.pdf

Tenan, M. S., Simon, J. E., Robins, R. J., Lee, I., Sheean, A. J., & Dickens, J. F. (2020). Anchored minimal clinically important difference metrics: Considerations for bias and regression to the mean. Journal of Athletic Training, 56(9), 1042–1049. https://doi.org/10.4085/1062-6050-0368.20