Data in the Educational and Social Sciences: It’s Time for Some Respect
This article introduces the concept of the carrying capacity of data (CCD), defined as an integrated, evaluative judgment of the credibility of specif.
- Pub. date: August 15, 2021
- Pages: 447-463
- 429 Downloads
- 949 Views
- 2 Citations
This article introduces the concept of the carrying capacity of data (CCD), defined as an integrated, evaluative judgment of the credibility of specific data-based inferences, informed by quantitative and qualitative analyses, leavened by experience. The sequential process of evaluating the CCD is represented schematically by a framework that can guide data analysis and statistical inference, as well as pedagogy. Aspects of each phase are illustrated with examples. A key initial activity in empirical work is data scrutiny, comprising consideration of data provenance and characteristics, as well as data limitations in light of the context and purpose of the study. Relevant auxiliary information can contribute to evaluating the CCD, as can sensitivity analyses conducted at the modeling stage. It is argued that early courses in statistical methods, and the textbooks they rely on, typically give little emphasis to, or omit entirely, discussion of the importance of data scrutiny in scientific research. This inattention and lack of guided, practical experience leaves students unprepared for the real world of empirical studies. Instructors should both cultivate in their students a true respect for data and engage them in authentic empirical research involving real data, rather than the context-free data to which they are usually exposed.
authentic data examples carrying capacity of data data analysis framework quantifying uncertainty teaching data analysis
Keywords: Authentic data examples, carrying capacity of data, data analysis framework, quantifying uncertainty, teaching data analysis.
References
An, C., Braun, H. I., & Walsh, M. E. (2018). Examining estimates of intervention effectiveness using sensitivity analysis. Educational Measurement: Issues and Practice, 27(2), 45-53. https://doi.org/10.1111/emip.12176
Becker, W., Paruolo, P., & Saltelli, A. (2014). Exploring Hoover and Perez’s experimental design using global sensitivity analysis. arXiv. https://arxiv.org/pdf/1401.5617.pdf
Benjamini, Y., Hechtlinger, Y., & Stark, P. B. (2019, June 2). Confidence intervals for selected parameters. arXiv. https://arxiv.org/pdf/1906.00505.pdf
Benjamini, Y., Heller, R., & Yekutieli, D. (2009). Selective inference in complex research. Philosophical Transactions of the Royal Society A, 367, 4255-4271. https://doi.org/10.1098/rsta.2009.0127
Berk, R., Brown, L., Buja, A., Zhang, K., & Zhao, L. (2013). Valid post-selection inference. Annals of Statistics, 41(2), 802-837. https://doi.org/10.1214/12-AOS1077
Berk, R. A., & Freedman, D. A. (2003). Statistical assumptions as empirical commitments. In T. G. Blomberg & S. Cohen (Eds.), Law, punishment and social control: Essays in Honor of Sheldon Messinger (2nd ed., pp. 235-254). Aldine de Gruyer.
Blocker, A. W., & Meng, X. L. (2013). The potential and perils of preprocessing: Building new foundations. Bernouilli, 19(4), 1176–1211. https://doi.org/10.3150/13-BEJSP16
Bond, T. N., & Lang, K. (2019). The sad truth about happiness scales. Journal of Political Economy, 127(4), 1629-1640. https://doi.org/10.1086/701679
Borenstein, M., Hedges, L. V., Higgins, J. P. T., & Rothstein, H. R. (2009). Introduction to meta-analysis. John Wiley & Sons. https://doi.org/10.1002/9780470743386
Borgman, C. L. (2019). The lives and after lives of data. Harvard Data Science Review, 1(1). https://doi.org/10.1162/99608f92.9a36bdb6
Braun, H. I. (1989, August 6-11). Analysis of retrospective ascertainment data in a legal setting [Paper presentation]. 1989 Annual Meeting of the American Statistical Association, American Statistical Association, Washington, DC, United States.
Braun, H. I. (1990). Data in the social sciences: It’s time for some respect [Unpublished report]. Educational Testing Service.
Braun, H. I. (2015). The value in value-added depends on the ecology. Educational Researcher, 44(2), 127-131. https://doi.org/10.3102%2F0013189X15576341
Braun, H. I., Jenkins, F., & Chaney, B. (2017). Value-added evaluation of teacher preparation programs: Sensitivity of rankings to model specification. Center for the Study of Testing, Evaluation, and Education Policy- Boston College.
Braun, H. I., Kirsch, I., & Yamamoto, K. (2011). An experimental study of the effects of monetary incentives on performance on the 12th grade NAEP reading assessment. Teachers College Record, 113(11), 2309-2344.
Braun, H. I., & Wainer, H. (2007). Value-added modeling. In C. R. Rao & S. Sinharay (Eds.), Handbook of Statistics (Vol. 27, (pp. 867-892). Elsevier Science. https://doi.org/10.1080/09332480.2011.10739845
Breiman, L. (2001). Statistical modeling: The two cultures. Statistical Science, 16, 199–231. https://doi.org/10.1214/ss/1009213726
Carnegie, N., Harada, M., & Hill, J. (2016). Assessing sensitivity to unmeasured confounding using a simulated potential confounder. Journal of Research on Educational Effectiveness, 9(3), 395-420. https://doi.org/10.1080/19345747.2015.1078862
Chetty, R., Friedman, J. N., Hilger, N., Saez, E., Schanzenbach, D. W., & Yagan, D. (2011). How does your kindergarten classroom affect your earnings? Evidence from Project Star. The Quarterly Journal of Economics, 126(4), 1593-1660. https://doi.org/10.1093/qje/qjr041
Darling-Hammond, L., Amrein-Beardsley, A., Haertel, E., & Rothstein, R. (2012). Evaluating teacher evaluation. Phi Delta Kappan, 93(6), 8-15. https://doi.org/10.1177/003172171209300603
Dearing, D., & Zachrisson, H. D. (2019). Taking selection seriously in correlational studies of child development: A call for sensitivity analyses. Child Development Perspectives, 13(4), 267-273. https://doi.org/10.1111/cdep.12343
Diaconu, D. V. (2012). Modeling science achievement differences between single-sex and coeducational schools: analyses from Hong Kong, SAR and New Zealand from TIMSS 1995, 1999, and 2003. [Unpublished doctoral dissertation]. Boston College.
Donoho, D. (2017). 50 years of data science. Journal of Computational and Graphical Statistics, 26(4), 745-766. https://doi.org/10.1080/10618600.2017.1384734
Freedman, D. A. (2010). Statistical models and causal inference: A dialogue with the social sciences. Cambridge University Press. https://doi.org/10.1017/CBO9780511815874
Ginsberg, A. L., Noel, J., & Plisko, V. W. (1988). Lessons from the Wall Chart. Education Evaluation and Policy Analysis, 10(1), 1-12. https://doi.org/10.2307/1163860
Goldhaber, D., Holden, K. L., & Grout, C. (2019). Errors in administrative education data: A cautionary tale. Educational Researcher, 48(3), 179-182. https://doi.org/10.3102/0013189X19837598
Hargreaves, A., & Braun, H. I. (2012). Leading for All: A research report of the development, design, implementation and impact of Ontario’s “Essential for Some, Good for All” initiative. Council of Ontario Directors of Education. https://doi.org/10.1108/JPCC-06-2019-0013
Harris, D. (2009). Would accountability based on teacher value-added be smart policy? An examination of the statistical properties and policy alternatives. Education Finance and Policy, 4(4), 319-350. https://doi.org/10.1162/edfp.2009.4.4.319
Hedges, L. (1988). The meta-analysis of test validity: Some new approaches. In H. Wainer & H. I. Braun (Eds.), Test validity (pp. 191-212). Lawrence Erlbaum Assoc.
Holland, P. W., & Wainer, H. (1990). Sources of uncertainty often ignored in adjusting state mean SAT scores for differential participation rates: The rules of the game. Applied Measurement in Education, 3(2), 167-184. https://doi.org/10.1207/s15324818ame0302_3
Ioannidis, J. P. A. (2005). Why most published research findings are false. PLoS Medicine, 2(8), 697-701. https://doi.org/10.1371/journal.pmed.0020124
Jankowski, N., & Marshall, D. W. (2017). Degrees that matter: Moving higher education to a learning systems paradigm. Stylus Publishing, LLC.
Kalbfleisch, J. D., & Lawless, J. F. (1989). Inference based on retrospective ascertainment: An analysis of data in transfusion related AIDS. Journal of the American Statistical Association, 84(406), 360-372. https://doi.org/10.2307/2289919
Keller, S. A., Shipp, S. S., Schroeder, A. D., & Korkmaz, G. (2020). Doing data science: A framework and case study. Harvard Data Science Review. https://doi.org/10.1162/99608f92.2d83f7f5
Kennet, R. S., & Redman, T. C. (2019). The real work of data science: Turning data into information, better decisions, and stronger organizations. Wiley. https://doi.org/10.1002/9781119570790
Leonelli, S. (2019). Data governance is key to interpretation: Reconceptualizing data in data science. Harvard Data Science Review. https://doi.org/10.1162/99608f92.17405bb6
Lockwood, J., & McCaffrey, D. (2007). Controlling for individual heterogeneity in longitudinal models, with applications to student achievement. Electronic Journal of Statistics, 1, 223-252. https://doi.org/10.1214/07-EJS057
Maronna, R. A. (2018). Robust statistics: Theory and methods (2nd ed.). Wiley. https://doi.org/10.1002/9781119214656
Mayo, D. G., & Cox, D. R. (2010). Frequentist statistics as a theory of inductive inference. In D. G. Mayo & A. Spanos (Eds.), Error and Inference: Recent exchanges on experimental reasoning, reliability, and the objectivity and rationality of science. Cambridge University Press.
McCaffrey, D. F., & Culpepper, S. A. (2021). Introduction to JEBS special issue on NAEP linked aggregate scores. Journal of Educational and Behavioral Statistics, 46(2), 135-137. https://doi.org/10.3102%2F10769986211001480
Meng, X. L. (2018). Statistical paradises and paradoxes in big data (I): Law of large populations, big data paradox, and the 2016 US presidential election. The Annals of Applied Statistics, 12(2), 685-726. https://doi.org/10.1214/18-AOAS1161SF
Montgomery, M. R., Richards, T., & Braun, H. I. (1986). Child health, breastfeeding and survival in Malaysia: A random effects logit approach. Journal of the American Statistical Association, 81(394), 297-309. https://doi.org/10.1080/01621459.1986.10478273
Nye, B., Hedges, L. V., & Konstantopoulos, S. (2001). The long-term effects of small classes in early grades: Lasting benefits in mathematics achievement at grade nine. Journal of Experimental Education, 69, 245-257. https://doi.org/10.1080/00220970109599487
Organisation for Economic Co-operation and Development. (2013). Survey of adult skills technical report (2nd ed.). OECD Publishing.
Oster, E. (2019). Unobservable selection and coefficient stability: Theory and evidence. Journal of Business & Economic Statistics, 37, 187–204. https://doi.org/10.1080/07350015.2016.1227711
Pearman II, P. A., Springer, M. P., Lipsey, M., Lachowicz, M., Swain, W., & Farran, D. (2020). Teachers, schools, and pre-K effect persistence: An examination of the sustaining environment hypothesis. Journal of Research on Educational Effectiveness, 13(4), 547-573. https://doi.org/10.1080/19345747.2020.1749740
Provasniak, S. (2021). Process data, the new frontier for assessment development: rich new soil or a quixotic quest? Large-scale Assessments in Education, 9(1). https://doi.org/10.1186/s40536-020-00092-z
Reardon, S. F., Kalogrides, D., & Ho, A. (2021). Validation methods for aggregate-level test scale linking: A case study mapping school district test score distributions to a common scale. Journal of Educational and Behavioral Statistics, 46(2), 138-167. https://doi.org/10.3102/1076998619874089
Reardon, S. F., & Raudenbush, S. W. (2009). Assumptions of value-added models for estimating school effects. Education Finance and Policy, 4(4), 492-519. https://doi.org/10.1162/edfp.2009.4.4.492
Rosenbaum, P. R. (1989). Sensitivity analysis for certain permutation inferences in matched observational studies. Biometrika, 74, 13- 26. https://doi.org/10.1093/biomet/74.1.13
Rosenbaum, P. R. (2002). Observational studies (2nd ed.). Springer-Verlag. https://doi.org/10.1007/978-1-4757-3692-2
Rosenbaum, P. R., & Rubin, D. B. (1983). Assessing sensitivity to an unobserved bivariate covariate in an observational study with binary outcome. Journal of the Royal Statistical Society: Series B (Methodological), 45(2), 212-218. https://doi.org/10.1111/j.2517-6161.1983.tb01242.x
Rothstein, J. (2009). Student sorting and bias in value-added estimation: Selection on observables and unobservables. Education Finance and Policy, 4(4), 537-571. https://doi.org/10.1162/edfp.2009.4.4.537
Russell, M., Ludlow, L., & O’Dwyer, L. (2019). Preparing the next generation of educational measurement specialists: A call for programs with an integrated scope and sequence. Educational Measurement: Issues and Practice, 38(4), 78-86. https://doi.org/10.1111/emip.12285
Sanders, W., Saxton, A., & Horn, S. (1997). The Tennessee value-added assessment system: A quantitative, outcome-based approach to educational assessment. In J. Millman (Ed.), Grading teachers, grading schools: Is student achievement a valid evaluation measure? (pp. 137-162). Corwin Press. https://doi.org/10.3102/10769986029001037
Shavelson, R. J., & Towne, L. (Eds.) (2002). Scientific research in education. National Academies Press. https://doi.org/10.17226/10236
Singer, J. D., Braun, H. I., & Chudowski, N. (Eds.) (2018). International education assessments: Cautions, conundrums, and common sense. National Academy of Education. https://doi.org/10.31094/2018/1
Singer, J. D., & Willett, J. B. (1990). Improving the teaching of applied statistics: Putting the data back into data analysis. The American Statistician, 44(3), 223-230. https://doi.org/10.2307/2685342
Stark, P. B., & Saltelli, A. (2018, August). Cargo-cult statistics and scientific crisis. Significance, 15(4), 40-43. https://doi.org/10.1111/j.1740-9713.2018.01174.x
Steelman, L. C., & Powell, B. (1985). Appraising the implications of the SAT for education policy. Phi Delta Kappan, 67, 603-606.
Tibshirani, R. J., Taylor, J., Lockhart, R., & Tibshirani, R. (2016). Exact postselection inference for sequential regression procedures. Journal of the American Statistical Association, 111(514), 600-620. https://doi.org/10.1080/01621459.2015.1108848
Tipton, E. (2014). How generalizable is your experiment? An index for comparing experimental samples and populations. Journal of Educational and Behavioral Statistics, 39(6), 478-501. https://doi.org/10.3102/1076998614558486
Tipton, E., & Olsen, R. (2018). A review of statistical methods for generalizing from evaluations of educational interventions. Educational Researcher, 47(8), 516-524. https://doi.org/10.3102/0013189X18781522
Tukey, J. W. (1962). The future of data analysis. The Annals of Mathematical Statistics, 33(1), 1-67. https://doi.org/10.1214/aoms/1177704711
Tukey, J. W. (1976). Exploratory data analysis. Addison-Wesley.
van der Sluijs, J., Craye, M., Funtowicz, S., Kloprogge, P., Ravetz, J., & Risbey, J. (2005). Combining quantitative and qualitative measures of uncertainty in model based environmental assessment: The NUSAP System. Risk Analysis, 25(2), 481-492. https://doi.org/10.1111/j.1539-6924.2005.00604.x
von Davier, M., & Sinharay, S. (2014). Analytics in international large scale assessments: item response theory and population models. In L. Rutkowski, M. von Davier, & D. Rutkowski (Eds.), Handbook of international large-scale assessment: Background, technical issues, and methods of data analysis (pp. 155-174). CRC Press.
Wainer, H. (1986a). The SAT as a social indicator. A pretty bad idea. In H. Wainer (Ed.) Drawing Inferences from self-selected samples (pp. 7-21). Springer-Verlag. https://doi.org/10.1007/978-1-4612-4976-4_2
Wainer, H. (1986b). Five pitfalls encountered when trying to compare states on their SAT scores. Journal of Educational Statistics, 11, 239-244. https://doi.org/10.1111/j.1745-3984.1986.tb00235.x
Walsh, M. E., Madaus, G. F., Raczek, A. E., Dearing, E., Foley, C., An, C., Lee-St. John, T. L., & Beaton, A. E. (2014). A new model for student support in high-poverty urban elementary schools: Effects on elementary and middle school academic outcomes. American Educational Research Journal, 51(4), 704-737. https://doi.org/10.3102/0002831214541669
Wolf, R., Morrison, J., Inns, A., Slavin, R., & Risman, K. (2020). Average effect sizes in developer-commissioned and independent evaluations. Journal of Research on Educational Effectiveness, 13(2), 428-447. https://doi.org/10.1080/19345747.2020.1726537