Investigative questions with secondary data: Characterizing high school students’ questions and the role of data visualization in refinement

Authors

  • Hyejin Jun Seoul National University
  • Kyeonghwa Lee Seoul National University

DOI:

https://doi.org/10.52041/serj.820

Keywords:

Statistics education research, Statistical investigation, Investigative questions, Secondary data, Data visualization, High school students

Abstract

We investigated how high school students formulate and refine investigative questions when conducting a statistical investigation with secondary data. The data consisted of students’ written activity reports and email-based, post-interview responses collected after a seven-session instructional sequence in which CODAP served as the primary tool for multivariate data analysis. We distinguished initiating investigative questions (IIQs) from analysis-phase investigative questions (AIQs) implied in students’ analysis plans and characterized both sets across five analytical components: variables, clarity of population, intent, feasibility of drawing conclusions from the data, and global view of data. We then used thematic analysis to examine how data visualization appeared to be involved at points where IIQ-to-AIQ refinement was evident. Compared with IIQs, AIQs more often included a greater number of clearer variables and took forms that were more feasible for drawing conclusions from the given dataset. Across the three episodes analyzed, the representational and exploratory functions of data visualizations appeared to support refinement by helping students operationalize everyday terms, narrow populations, anticipate interpretable relationships, and set aside uninformative variables. This study offers classroom-based empirical insights into investigative questions in the context of secondary data and into the potential role of data visualization in their refinement.

References

Allmond, S., & Makar, K. (2010). Developing primary students’ ability to pose questions in statistical investigations. In C. Reading (Ed.), Data and context in statistics education: Towards an evidence-based society. Proceedings of the eighth International Conference on Teaching Statistics (ICOTS8, July 2010), Ljubljana, Slovenia. International Statistical Institute. https://icots.info/icots/8/cd/pdfs/invited/ICOTS8_8A1_ALLMOND.pdf

Arnold, P. M. (2013). Statistical investigative questions: An enquiry into posing and answering investigative questions from existing data [Doctoral dissertation, University of Auckland]. https://researchspace.auckland.ac.nz/handle/2292/21305

Arnold, P., & Franklin, C. (2021). What makes a good statistical question? Journal of Statistics and Data Science Education, 29(1), 122–130. https://doi.org/10.1080/26939169.2021.1877582

Bakker, A., & Gravemeijer, K. P. (2004). Learning to reason about distribution. In D. Ben-Zvi & J. Garfield (Eds.), The challenge of developing statistical literacy, reasoning and thinking (pp. 147–168). Springer. https://doi.org/10.1007/1-4020-2278-6

Bargagliotti, A., Franklin, C., Arnold, P., Gould, R., Johnson, S., Perez, L., & Spangler, D. (2020). Pre-K–12 Guidelines for Assessment and Instruction in Statistics Education II (GAISE II). American Statistical Association; National Council of Teachers of Mathematics. https://www.amstat.org/asa/files/pdfs/GAISE/GAISEIIPreK-12_Full.pdf

Bargagliotti, A., & Gould, R. (2022). Secondary data in the secondary data science and statistics classroom. In S. A. Peters, L. Zapata-Cardona, F. Bonafini, & A. Fan (Eds.), Bridging the gap: Empowering and educating today’s learners in statistics. Proceedings of the eleventh International Conference on Teaching Statistics (ICOTS11, 2022), Rosario, Argentina. International Association for Statistical Education. https://icots.info/icots/11/proceedings/pdfs/ICOTS11_127

_GOULD.pdf?1669865522

Ben-Zvi, D., & Arcavi, A. (2001). Junior high school students’ construction of global views of data and data representations. Educational Studies in Mathematics, 45 (1–3), 35–65. https://doi.org/10.1023/A:1013809201228

Biehler, R., Frischemeier, D., Gould, R., & Pfannkuch, M. (2024). Impacts of digitalization on content and goals of statistics education. In B. Pepin, G. Gueudet, & J. Choppin (Eds.), Handbook of digital resources in mathematics education (pp. 547–583). Springer. https://doi.org/10.1007/978-3-031-45667-1_20

Bolch, C., & Crippen, K. (2022). Data scientists’ epistemic thinking for creating and interpreting visualizations. Statistics Education Research Journal, 21(2), Article 11. https://doi.org/10.52041/serj.v21i2.21

Braun, V., & Clarke, V. (2006). Using thematic analysis in psychology. Qualitative Research in Psychology, 3(2), 77–101. https://doi.org/10.1191/1478088706qp063oa

Buehring, R. S., & Grando, R. C. (2023). Reading and writing the world with children: Statistical thinking and multivariate data. Statistics Education Research Journal, 22(2), Article 6. https://doi.org/10.52041/serj.v22i2.446

Cairo, A. (2012). The functional art: An introduction to information graphics and visualization. New Riders.

Card, S. K., Mackinlay, J. D., & Shneiderman, B. (Eds.). (1999). Readings in information visualization: Using vision to think. Morgan Kaufmann.

Cobb, P., & McClain, K. (2004). Principles of instructional design for supporting the development of students’ statistical reasoning. In D. Ben-Zvi & J. Garfield (Eds.), The challenge of developing statistical literacy, reasoning and thinking (pp. 375–396). Springer. https://doi.org/10.1007/1-4020-2278-6

Cobb, G. W., & Moore, D. S. (1997). Mathematics, statistics, and teaching. The American Mathematical Monthly, 104(9), 801–823. https://doi.org/10.1080/00029890.1997.11990723

Engel, J. (2017). Statistical literacy for active citizenship: A call for data science education. Statistics Education Research Journal, 16(1), 44–49. https://doi.org/10.52041/serj.v16i1.213

Fielding, J., Makar, K., & Ben-Zvi, D. (2025). Developing students’ reasoning with data and data-ing. ZDM Mathematics Education, 57(1), 1–18. https://doi.org/10.1007/s11858-025-01671-6

Franke, B., Plante, J. F., Roscher, R., Lee, E. S. A., Smyth, C., Hatefi, A., Chen, F., Gil, E., Schwing, A., Selvitella, A., Hoffman, M. M., Grosse, R., Hendricks, D., & Reid, N. (2016). Statistical inference, learning and models in big data. International Statistical Review, 84(3), 371–389. https://doi.org/10.1111/insr.12176

Frischemeier, D., & Biehler, R. (2018). Stepwise development of statistical literacy and thinking in a statistics course for elementary preservice teachers. In T. Dooley & G. Gueudet (Eds.), Proceedings of the 10th Congress of the European Society for Research in Mathematics Education (pp. 756–763). DCU Institute of Education; ERME. https://hal.archives-ouvertes.fr/hal-01927856

Frischemeier, D., & Leavy, A. (2020). Improving the quality of statistical questions posed for group comparison situations. Teaching Statistics, 42(2), 58–65. https://doi.org/10.1111/test.12222

Graham, A. (2006). Developing thinking in statistics. Paul Chapman.

Hall, J. (2011). Engaging teachers and students with real data: Benefits and challenges. In C. Batanero, G. Burrill, & C. Reading (Eds.), Teaching statistics in school mathematics: Challenges for teaching and teacher education (pp. 335–346). Springer. https://doi.org/10.1007/978-94-007-1131-0_32

Higgins, T., Mokros, J., Rubin, A., & Sagrans, J. (2023). Students’ approaches to exploring relationships between categorical variables. Teaching Statistics, 45(S1), S52–S66. https://doi.org/10.1111/test.12331

James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning: With applications in R. Springer. https://doi.org/10.1007/978-1-4614-7138-7

Kazak, S., Fujita, T., & Turmo, M. P. (2023). Students’ informal statistical inferences through data modeling with a large multivariate dataset. Mathematical Thinking and Learning, 25(1), 23–43. https://doi.org/10.1080/10986065.2021.1922857

Kim, A. Y., Ismay, C., & Chunn, J. (2018). The fivethirtyeight R package: “Tame data” principles for introductory statistics and data science courses. Technology Innovations in Statistics Education, 11(1). https://doi.org/10.5070/T5111035892

Korea Sports Promotion Foundation. (n.d.). Physical fitness measurement data. Bigdata-Culture. https://www.bigdata-culture.kr/bigdata/user/data_market/detail.do?id=ace0aea7-5eee-48b9-b616-637365d665c1

Laina, V., & Wilkerson, M. H. (2016). Distributions, trends, and contradictions: A case study in sensemaking with interactive data visualizations. In C. K. Looi, J. L. Polman, U. Cress, & P. Reimann (Eds.), Transforming learning, empowering learners: The International Conference of the Learning Sciences (ICLS) 2016 (Vol. 2, pp. 934–937). International Society of the Learning Sciences. https://repository.isls.org/bitstream/1/347/1/140.pdf

Leavy, A., & Frischemeier, D. (2022). Developing the statistical problem posing and problem refining skills of prospective teachers. Statistics Education Research Journal, 21(1), Article 10. https://doi.org/10.52041/serj.v21i1.226

Lee, H., Mojica, G., Thrasher, E., & Baumgartner, P. (2022). Investigating data like a data scientist: Key practices and processes. Statistics Education Research Journal, 21(2), Article 3. https://doi.org/10.52041/serj.v21i2.41

Lee, V. R., & Wilkerson, M. (2018). Data use by middle and secondary students in the digital age: A status report and future prospects. National Academies of Sciences, Engineering, and Medicine. https://digitalcommons.usu.edu/itls_facpub/634/

Makar, K., & Rubin, A. (2009). A framework for thinking about informal statistical inference. Statistics Education Research Journal, 8(1), 82–105. https://doi.org/10.52041/serj.v8i1.457

Merriam, S. B., & Tisdell, E. J. (2016). Qualitative research: A guide to design and implementation (4th ed.). Jossey-Bass.

Podworny, S., Fleischer, Y., Stroop, D., & Biehler, R. (2022). An example of rich, real and multivariate survey data for use in school. In J. Hodgen, E. Geraniou, G. Bolondi, & F. Ferretti (Eds.), Proceedings of the twelfth Congress of the European Society for Research in Mathematics Education (CERME12) (pp. 940–947). ERME; Free University of Bozen-Bolzano. https://hal.science/CERME12/hal-03751842v1

Sutherland, S., & Ridgway, J. (2017). Interactive visualisations and statistical literacy. Statistics Education Research Journal, 16(1), 26–30. https://doi.org/10.52041/serj.v16i1.210

The Concord Consortium. (2014). Common Online Data Analysis Platform (CODAP) [Computer software]. https://codap.concord.org/

Tukey, J. W. (1977). Exploratory data analysis. Addison-Wesley.

Watson, J. M., & English, L. D. (2017). Statistical problem posing, problem refining, and further reflection in grade 6. Canadian Journal of Science, Mathematics and Technology Education, 17(4), 347–365. https://doi.org/10.1080/14926156.2017.1380867

Watson, J., & Fitzallen, N. (2015). Statistical software and mathematics education: Affordances for learning. In L. D. English & D. Kirshner (Eds.), Handbook of international research in mathematics education (3rd ed., pp. 563–594). Routledge. https://doi.org/10.4324/9780203448946-29

Wild, C. J., & Pfannkuch, M. (1999). Statistical thinking in empirical enquiry. International Statistical Review, 67(3), 223–248. https://doi.org/10.1111/j.1751-5823.1999.tb00442.x

Wilke, C. O. (2019). Fundamentals of data visualization: A primer on making informative and compelling figures. O’Reilly Media.

Zapata-Cardona, L. (2025). Public engagement of underserved students with open civic data. ZDM Mathematics Education, 57(1), 19–30. https://doi.org/10.1007/s11858-024-01641-4

Downloads

Published

2026-04-28

Issue

Section

Regular Articles