Systematic review of statistical ability measures

Authors

DOI:

https://doi.org/10.52041/serj.801

Keywords:

Statistical ability, Systematic review, Measurement, Reliability, Validity

Abstract

This systematic review investigates measures of statistical ability in published literature to understand how statistical ability has been conceptualised and assessed. The review examines the components, reliability, validity, and correlations of these measures with cognitive (e.g., intelligence) and non-cognitive (e.g., attitude towards statistics) factors. From 51 papers, 25 unique measures were identified, with 60% assessing knowledge-based competencies. The validity evidence suggests that these measures assess their intended learning outcomes. Correlations between the measures and cognitive factors were stronger when closely aligned with the assessed ability. Research reporting correlations between statistical ability measures and non-cognitive factors is relatively limited. The review aims to inform educators and provide direction for future measurement development to address the identified gaps in the literature.

References

Aliaga, M., Cobb, G., Cuff, C., Garfield, J., Gould, R., Lock, R., Moore, T., Rossman, A., Stephenson, R., Utts, J., Velleman, P., & Witmer, J. (2005). Guidelines for assessment and instruction in statistics education college report. American Statistical Association. https://www.amstat.org/asa/files/pdfs/GAISE/2005GaiseCollege_Full.pdf

American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (2014). Standards for educational and psychological testing. American Educational Research Association. https://www.testingstandards.net/open-access-files.html

Bargagliotti, A., Franklin, C., Arnold, P., Gould, R., Johnson, S., Perez, L., & Spangler, D. (2020). Pre-K-12 guidelines for assessment and instruction in statistics education (GAISE) report II. American Statistical Association and National Council of Teachers of Mathematics. https://www.amstat.org/docs/default-source/amstat-documents/gaiseiiprek-12_full.pdf

Boring, E. G. (1923). Intelligence as the tests test it. New Republic, 36, 35-37.

Borsboom, D., Mellenbergh, G. J., & Van Heerden, J. (2004). The concept of validity. Psychological Review, 111(4), 1061-1071. https://doi.org/10.1037/0033-295X.111.4.1061

Bostic, J., Folger, T. D., Krupa, E., Burkett, K., & Bentley, B. (2024). Validity and validation [PowerPoint slides]. Validity Evidence for Measurement in Mathematics Education. https://www.mathedmeasures.org/

Chance, B. L. (2002). Components of statistical thinking and implications for instruction and assessment. Journal of Statistics Education, 10(3), 1-13. https://doi.org/10.1080/10691898.2002.11910677

Chiesi, F., & Primi, C. (2010). Cognitive and non-cognitive factors related to students' statistics achievement. Statistics Education Research Journal, 9(1), 6-26. https://doi.org/10.52041/serj.v9i1.385

Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Lawrence Erlbaum Associates. https://doi.org/10.4324/9780203771587

Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16(3), 297-334. https://doi.org/10.1007/BF02310555

Davenport, E. C., Davison, M. L., Liou, P., & Love, Q. U. (2015). Reliability, dimensionality, and internal consistency as defined by Cronbach: Distinct albeit related concepts. Educational Measurement: Issues and Practice, 34(4), 4-9. https://doi.org/10.1111/emip.12095

delMas, R. C. (2002). Statistical literacy, reasoning, and learning: A commentary. Journal of Statistics Education, 10(3), Article 5. https://doi.org/10.1080/10691898.2002.11910679

Emmioglu, E., & Capa-Aydin, Y. (2012). Attitudes and achievement in statistics: A meta-analysis study. Statistics Education Research Journal, 11(2), 95-102. https://doi.org/10.52041/serj.v11i2.332

Falk, C. F., & Savalei, V. (2011). The relationship between unstandardized and standardized alpha, true reliability, and the underlying measurement model. Journal of Personality Assessment, 93(5), 445-453. https://doi.org/10.1080/00223891.2011.594129

Folger, T. D., Burkett, K., Bostic, J., Krupa, E., & Bentley, B. (2024). An introduction to validity in educational and psychological testing. Validity Evidence for Measurement in Mathematics Education. https://www.mathedmeasures.org/static/resource/validity_overview_VMED.pdf

Franklin, C., Kader, G., Mewborn, D., Moreno, J., Peck, R., Perry, M., & Scheaffer, R. (2007). Guidelines for assessment and instruction in statistics education (GAISE) report: A Pre-K-12 curriculum framework. American Statistical Association. https://www.amstat.org/asa/files/pdfs/GAISE/GAISEPreK-12_Full.pdf

Funder, D. C., & Ozer, D. J. (2019). Evaluating effect size in psychological research: Sense and nonsense. Advances in Methods and Practices in Psychological Science, 2(2), 156-168. https://doi.org/10.1177/2515245919847202

GAISE College Report ASA Revision Committee. (2016). Guidelines for assessment and instruction in statistics education (GAISE) college report 2016. American Statistical Association. https://www.amstat.org/docs/default-source/amstat-documents/gaisecollege_full.pdf

Gal, I., & Garfield, J. (1997). Curricular goals and assessment challenges in statistics education. In I. Gal & J. B. Garfield (Eds.), The assessment challenge in statistics education (pp. 1-13). IOS Press.

Garfield, J. (1995). How students learn statistics. International Statistical Review, 63(1), 25-34. https://doi.org/10.2307/1403775

Garfield, J. (2002). The challenge of developing statistical reasoning. Journal of Statistics Education, 10(3), 58-69. https://doi.org/10.1080/10691898.2002.11910676

Garfield, J., & Ben-Zvi, D. (2008). Developing students' statistical reasoning: Connecting research and teaching practice. Springer. https://doi.org/10.1007/978-1-4020-8383-9

Garfield, J., delMas, R., & Zieffler, A. (2010). Assessing important learning outcomes in introductory tertiary statistics courses. In P. Bidgood, N. Hunt, & F. Jolliffe (Eds.), Assessment methods in statistical education: An international perspective (pp. 75-86). John Wiley & Sons. https://doi.org/10.1002/9780470710470.ch7

Hamel, R. E. (2007). The dominance of English in the international scientific periodical literature and the future of language use in science. AILA Review, 20, 53-71. https://doi.org/10.1075/aila.20.06ham

Hogan, T. P., Benjamin, A., & Brezinski, K. L. (2000). Reliability methods: A note on the frequency of use of various types. Educational and Psychological Measurement, 60(4), 523-531. https://doi.org/10.1177/00131640021970691

Jones, G. A., Thornton, C. A., Langrall, C. W., Mooney, E. S., Perry, B., & Putt, I. J. (2000). A framework for characterizing children's statistical thinking. Mathematical Thinking and Learning, 2(4), 269-307. https://doi.org/10.1207/S15327833MTL0204_3

Kane, M. (2013). The argument-based approach to validation. School Psychology Review, 42(4), 448-457. https://doi.org/10.1080/02796015.2013.12087465

Kane, M. T. (1992). An argument-based approach to validity. Psychological Bulletin, 112(3), 527-535. https://doi.org/10.1037/0033-2909.112.3.527

Krupa, E. E., Bostic, J. D., Bentley, B., Folger, T., Burkett, K. E., & VM2ED community. (2024). VM2ED repository [Online repository]. https://www.mathedmeasures.org/

Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33(1), 159-174. https://doi.org/10.2307/2529310

Lane-Getaz, S. J. (2013). Development of a reliable measure of students' inferential reasoning ability. Statistics Education Research Journal, 12(1), 20-47. https://iase-pub.org/ojs/SERJ/article/view/320

Legacy, C., Le, L., Zieffler, A., Fry, E., & Vivas Corrales, P. (2024). The teaching of introductory statistics: Results of a national survey. Journal of Statistics and Data Science Education, 32(3), 232-240. https://doi.org/10.1080/26939169.2024.2333732

Makar, K., & Rubin, A. (2009). A framework for thinking about informal statistical inference. Statistics Education Research Journal, 8(1), 82-105. https://doi.org/10.52041/serj.v8i1.457

Nature Human Behaviour. (2023). Scientific publishing has a language problem. Nature Human Behaviour, 7(7), 1019-1020. https://doi.org/10.1038/s41562-023-01679-6

Nunnally, J. C., & Bernstein, I. H. (1994). Psychometric theory (3rd ed.). McGraw-Hill.

Pedhazur, E. J., & Schmelkin, L. P. (1991). Measurement, design, and analysis: An integrated approach. Lawrence Erlbaum Associates. https://www.taylorfrancis.com/books/9781135807085

Peres, F. F. (2025). Effect sizes for nonparametric tests. Biochemia Medica, 36(1), 010101. https://doi.org/10.11613/BM.2026.010101

Peters, M. D. J. (2017). Managing and coding references for systematic reviews and scoping reviews in EndNote. Medical Reference Services Quarterly, 36(1), 19-31. https://doi.org/10.1080/02763869.2017.1259891

Rumsey, D. J. (2002). Statistical literacy as a goal for introductory statistics courses. Journal of Statistics Education, 10(3), 6-13. https://doi.org/10.1080/10691898.2002.11910678

Salcedo, A. (2014). Statistics test questions: Content and trends. Statistics Education Research Journal, 13(2), 202-217. https://doi.org/10.52041/serj.v13i2.291

Schau, C. (2003). Students' attitudes: The "other" important outcome in statistics education. In H. Pan, Q. Chen, E. Stern, & D. A. Silbersweig (Eds.), Proceedings of the Joint Statistical Meeting (pp. 3673-3683). American Statistical Association. https://api.semanticscholar.org/CorpusID:154740605

Schau, C., Stevens, J., Dauphinee, T. L., & Vecchio, A. D. (1995). The development and validation of the Survey of Attitudes Toward Statistics. Educational and Psychological Measurement, 55(5), 868-875. https://doi.org/10.1177/0013164495055005022

Schmidt, K. M., & Embretson, S. E. (2012). Item response theory and measuring abilities. In I. B. Weiner, J. A. Schinka, & W. F. Velicer (Eds.), Handbook of psychology: Vol. 2. Research methods in psychology (2nd ed., pp. 451-473). John Wiley & Sons.

Sirota, M., Kostovicova, L., & Vallee-Tourangeau, F. (2015). Now you Bayes, now you don't: Effects of set-problem and frequency-format mental representations on statistical reasoning. Psychonomic Bulletin & Review, 22(5), 1465-1473. https://doi.org/10.3758/s13423-015-0810-y

Stankov, L. (2013). Noncognitive predictors of intelligence and academic achievement: An important role of confidence. Personality and Individual Differences, 55(7), 727-732. https://doi.org/10.1016/j.paid.2013.07.006

Stemler, S. E. (2004). A comparison of consensus, consistency, and measurement approaches to estimating interrater reliability. Practical Assessment, Research, and Evaluation, 9(4). https://doi.org/10.7275/96JP-XZ07

Thomson Reuters. (2013). EndNote (Version X7) [Computer software]. https://endnote.com/

Whitaker, D., Unfried, A., & Bond, M. (2022). Challenges associated with measuring attitudes using the SATS family of instruments. Statistics Education Research Journal, 21(1), 1-23. https://doi.org/10.52041/serj.v21i1.88

Wright, B. D., & Stone, M. H. (1999). Measurement essentials (2nd ed.). Wide Range.

Downloads

Published

2026-04-28

Issue

Section

Regular Articles