Systematic review of statistical ability measures
DOI:
https://doi.org/10.52041/serj.801Keywords:
Statistical ability, Systematic review, Measurement, Reliability, ValidityAbstract
This systematic review investigates measures of statistical ability in published literature to understand how statistical ability has been conceptualised and assessed. The review examines the components, reliability, validity, and correlations of these measures with cognitive (e.g., intelligence) and non-cognitive (e.g., attitude towards statistics) factors. From 51 papers, 25 unique measures were identified, with 60% assessing knowledge-based competencies. The validity evidence suggests that these measures assess their intended learning outcomes. Correlations between the measures and cognitive factors were stronger when closely aligned with the assessed ability. Research reporting correlations between statistical ability measures and non-cognitive factors is relatively limited. The review aims to inform educators and provide direction for future measurement development to address the identified gaps in the literature.
References
Aliaga, M., Cobb, G., Cuff, C., Garfield, J., Gould, R., Lock, R., Moore, T., Rossman, A., Stephenson, R., Utts, J., Velleman, P., & Witmer, J. (2005). Guidelines for assessment and instruction in statistics education college report. American Statistical Association. https://www.amstat.org/asa/files/pdfs/GAISE/2005GaiseCollege_Full.pdf
American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (2014). Standards for educational and psychological testing. American Educational Research Association. https://www.testingstandards.net/open-access-files.html
Bargagliotti, A., Franklin, C., Arnold, P., Gould, R., Johnson, S., Perez, L., & Spangler, D. (2020). Pre-K-12 guidelines for assessment and instruction in statistics education (GAISE) report II. American Statistical Association and National Council of Teachers of Mathematics. https://www.amstat.org/docs/default-source/amstat-documents/gaiseiiprek-12_full.pdf
Boring, E. G. (1923). Intelligence as the tests test it. New Republic, 36, 35-37.
Borsboom, D., Mellenbergh, G. J., & Van Heerden, J. (2004). The concept of validity. Psychological Review, 111(4), 1061-1071. https://doi.org/10.1037/0033-295X.111.4.1061
Bostic, J., Folger, T. D., Krupa, E., Burkett, K., & Bentley, B. (2024). Validity and validation [PowerPoint slides]. Validity Evidence for Measurement in Mathematics Education. https://www.mathedmeasures.org/
Chance, B. L. (2002). Components of statistical thinking and implications for instruction and assessment. Journal of Statistics Education, 10(3), 1-13. https://doi.org/10.1080/10691898.2002.11910677
Chiesi, F., & Primi, C. (2010). Cognitive and non-cognitive factors related to students' statistics achievement. Statistics Education Research Journal, 9(1), 6-26. https://doi.org/10.52041/serj.v9i1.385
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Lawrence Erlbaum Associates. https://doi.org/10.4324/9780203771587
Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16(3), 297-334. https://doi.org/10.1007/BF02310555
Davenport, E. C., Davison, M. L., Liou, P., & Love, Q. U. (2015). Reliability, dimensionality, and internal consistency as defined by Cronbach: Distinct albeit related concepts. Educational Measurement: Issues and Practice, 34(4), 4-9. https://doi.org/10.1111/emip.12095
delMas, R. C. (2002). Statistical literacy, reasoning, and learning: A commentary. Journal of Statistics Education, 10(3), Article 5. https://doi.org/10.1080/10691898.2002.11910679
Emmioglu, E., & Capa-Aydin, Y. (2012). Attitudes and achievement in statistics: A meta-analysis study. Statistics Education Research Journal, 11(2), 95-102. https://doi.org/10.52041/serj.v11i2.332
Falk, C. F., & Savalei, V. (2011). The relationship between unstandardized and standardized alpha, true reliability, and the underlying measurement model. Journal of Personality Assessment, 93(5), 445-453. https://doi.org/10.1080/00223891.2011.594129
Folger, T. D., Burkett, K., Bostic, J., Krupa, E., & Bentley, B. (2024). An introduction to validity in educational and psychological testing. Validity Evidence for Measurement in Mathematics Education. https://www.mathedmeasures.org/static/resource/validity_overview_VMED.pdf
Franklin, C., Kader, G., Mewborn, D., Moreno, J., Peck, R., Perry, M., & Scheaffer, R. (2007). Guidelines for assessment and instruction in statistics education (GAISE) report: A Pre-K-12 curriculum framework. American Statistical Association. https://www.amstat.org/asa/files/pdfs/GAISE/GAISEPreK-12_Full.pdf
Funder, D. C., & Ozer, D. J. (2019). Evaluating effect size in psychological research: Sense and nonsense. Advances in Methods and Practices in Psychological Science, 2(2), 156-168. https://doi.org/10.1177/2515245919847202
GAISE College Report ASA Revision Committee. (2016). Guidelines for assessment and instruction in statistics education (GAISE) college report 2016. American Statistical Association. https://www.amstat.org/docs/default-source/amstat-documents/gaisecollege_full.pdf
Gal, I., & Garfield, J. (1997). Curricular goals and assessment challenges in statistics education. In I. Gal & J. B. Garfield (Eds.), The assessment challenge in statistics education (pp. 1-13). IOS Press.
Garfield, J. (1995). How students learn statistics. International Statistical Review, 63(1), 25-34. https://doi.org/10.2307/1403775
Garfield, J. (2002). The challenge of developing statistical reasoning. Journal of Statistics Education, 10(3), 58-69. https://doi.org/10.1080/10691898.2002.11910676
Garfield, J., & Ben-Zvi, D. (2008). Developing students' statistical reasoning: Connecting research and teaching practice. Springer. https://doi.org/10.1007/978-1-4020-8383-9
Garfield, J., delMas, R., & Zieffler, A. (2010). Assessing important learning outcomes in introductory tertiary statistics courses. In P. Bidgood, N. Hunt, & F. Jolliffe (Eds.), Assessment methods in statistical education: An international perspective (pp. 75-86). John Wiley & Sons. https://doi.org/10.1002/9780470710470.ch7
Hamel, R. E. (2007). The dominance of English in the international scientific periodical literature and the future of language use in science. AILA Review, 20, 53-71. https://doi.org/10.1075/aila.20.06ham
Hogan, T. P., Benjamin, A., & Brezinski, K. L. (2000). Reliability methods: A note on the frequency of use of various types. Educational and Psychological Measurement, 60(4), 523-531. https://doi.org/10.1177/00131640021970691
Jones, G. A., Thornton, C. A., Langrall, C. W., Mooney, E. S., Perry, B., & Putt, I. J. (2000). A framework for characterizing children's statistical thinking. Mathematical Thinking and Learning, 2(4), 269-307. https://doi.org/10.1207/S15327833MTL0204_3
Kane, M. (2013). The argument-based approach to validation. School Psychology Review, 42(4), 448-457. https://doi.org/10.1080/02796015.2013.12087465
Kane, M. T. (1992). An argument-based approach to validity. Psychological Bulletin, 112(3), 527-535. https://doi.org/10.1037/0033-2909.112.3.527
Krupa, E. E., Bostic, J. D., Bentley, B., Folger, T., Burkett, K. E., & VM2ED community. (2024). VM2ED repository [Online repository]. https://www.mathedmeasures.org/
Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33(1), 159-174. https://doi.org/10.2307/2529310
Lane-Getaz, S. J. (2013). Development of a reliable measure of students' inferential reasoning ability. Statistics Education Research Journal, 12(1), 20-47. https://iase-pub.org/ojs/SERJ/article/view/320
Legacy, C., Le, L., Zieffler, A., Fry, E., & Vivas Corrales, P. (2024). The teaching of introductory statistics: Results of a national survey. Journal of Statistics and Data Science Education, 32(3), 232-240. https://doi.org/10.1080/26939169.2024.2333732
Makar, K., & Rubin, A. (2009). A framework for thinking about informal statistical inference. Statistics Education Research Journal, 8(1), 82-105. https://doi.org/10.52041/serj.v8i1.457
Nature Human Behaviour. (2023). Scientific publishing has a language problem. Nature Human Behaviour, 7(7), 1019-1020. https://doi.org/10.1038/s41562-023-01679-6
Nunnally, J. C., & Bernstein, I. H. (1994). Psychometric theory (3rd ed.). McGraw-Hill.
Pedhazur, E. J., & Schmelkin, L. P. (1991). Measurement, design, and analysis: An integrated approach. Lawrence Erlbaum Associates. https://www.taylorfrancis.com/books/9781135807085
Peres, F. F. (2025). Effect sizes for nonparametric tests. Biochemia Medica, 36(1), 010101. https://doi.org/10.11613/BM.2026.010101
Peters, M. D. J. (2017). Managing and coding references for systematic reviews and scoping reviews in EndNote. Medical Reference Services Quarterly, 36(1), 19-31. https://doi.org/10.1080/02763869.2017.1259891
Rumsey, D. J. (2002). Statistical literacy as a goal for introductory statistics courses. Journal of Statistics Education, 10(3), 6-13. https://doi.org/10.1080/10691898.2002.11910678
Salcedo, A. (2014). Statistics test questions: Content and trends. Statistics Education Research Journal, 13(2), 202-217. https://doi.org/10.52041/serj.v13i2.291
Schau, C. (2003). Students' attitudes: The "other" important outcome in statistics education. In H. Pan, Q. Chen, E. Stern, & D. A. Silbersweig (Eds.), Proceedings of the Joint Statistical Meeting (pp. 3673-3683). American Statistical Association. https://api.semanticscholar.org/CorpusID:154740605
Schau, C., Stevens, J., Dauphinee, T. L., & Vecchio, A. D. (1995). The development and validation of the Survey of Attitudes Toward Statistics. Educational and Psychological Measurement, 55(5), 868-875. https://doi.org/10.1177/0013164495055005022
Schmidt, K. M., & Embretson, S. E. (2012). Item response theory and measuring abilities. In I. B. Weiner, J. A. Schinka, & W. F. Velicer (Eds.), Handbook of psychology: Vol. 2. Research methods in psychology (2nd ed., pp. 451-473). John Wiley & Sons.
Sirota, M., Kostovicova, L., & Vallee-Tourangeau, F. (2015). Now you Bayes, now you don't: Effects of set-problem and frequency-format mental representations on statistical reasoning. Psychonomic Bulletin & Review, 22(5), 1465-1473. https://doi.org/10.3758/s13423-015-0810-y
Stankov, L. (2013). Noncognitive predictors of intelligence and academic achievement: An important role of confidence. Personality and Individual Differences, 55(7), 727-732. https://doi.org/10.1016/j.paid.2013.07.006
Stemler, S. E. (2004). A comparison of consensus, consistency, and measurement approaches to estimating interrater reliability. Practical Assessment, Research, and Evaluation, 9(4). https://doi.org/10.7275/96JP-XZ07
Thomson Reuters. (2013). EndNote (Version X7) [Computer software]. https://endnote.com/
Whitaker, D., Unfried, A., & Bond, M. (2022). Challenges associated with measuring attitudes using the SATS family of instruments. Statistics Education Research Journal, 21(1), 1-23. https://doi.org/10.52041/serj.v21i1.88
Wright, B. D., & Stone, M. H. (1999). Measurement essentials (2nd ed.). Wide Range.