logo logo International Journal of Educational Methodology

IJEM is a leading, peer-reviewed, open access, research journal that provides an online forum for studies in education, by and for scholars and practitioners, worldwide.

Subscribe to

Receive Email Alerts

for special events, calls for papers, and professional development opportunities.

Subscribe

Publisher (HQ)

RHAPSODE
Eurasian Society of Educational Research
College House, 2nd Floor 17 King Edwards Road, Ruislip, London, HA4 7AE, UK
RHAPSODE
Headquarters
College House, 2nd Floor 17 King Edwards Road, Ruislip, London, HA4 7AE, UK
Research Article

Data in the Educational and Social Sciences: It’s Time for Some Respect

Henry Braun

This article introduces the concept of the carrying capacity of data (CCD), defined as an integrated, evaluative judgment of the credibility of specif.

T

This article introduces the concept of the carrying capacity of data (CCD), defined as an integrated, evaluative judgment of the credibility of specific data-based inferences, informed by quantitative and qualitative analyses, leavened by experience. The sequential process of evaluating the CCD is represented schematically by a framework that can guide data analysis and statistical inference, as well as pedagogy. Aspects of each phase are illustrated with examples. A key initial activity in empirical work is data scrutiny, comprising consideration of data provenance and characteristics, as well as data limitations in light of the context and purpose of the study.  Relevant auxiliary information can contribute to evaluating the CCD, as can sensitivity analyses conducted at the modeling stage. It is argued that early courses in statistical methods, and the textbooks they rely on, typically give little emphasis to, or omit entirely, discussion of the importance of data scrutiny in scientific research. This inattention and lack of guided, practical experience leaves students unprepared for the real world of empirical studies. Instructors should both cultivate in their students a true respect for data and engage them in authentic empirical research involving real data, rather than the context-free data to which they are usually exposed.

Keywords: Authentic data examples, carrying capacity of data, data analysis framework, quantifying uncertainty, teaching data analysis.

cloud_download PDF
Cite
Article Metrics
Views
429
Download
949
Citations
Crossref
2

Scopus
0

References

An, C., Braun, H. I., & Walsh, M. E. (2018). Examining estimates of intervention effectiveness using sensitivity analysis. Educational Measurement: Issues and Practice, 27(2), 45-53. https://doi.org/10.1111/emip.12176  

Becker, W., Paruolo, P., & Saltelli, A. (2014). Exploring Hoover and Perez’s experimental design using global sensitivity analysis. arXiv. https://arxiv.org/pdf/1401.5617.pdf 

Benjamini, Y., Hechtlinger, Y., & Stark, P. B. (2019, June 2). Confidence intervals for selected parameters. arXiv. https://arxiv.org/pdf/1906.00505.pdf

Benjamini, Y., Heller, R., & Yekutieli, D. (2009). Selective inference in complex research. Philosophical Transactions of the Royal Society A, 367, 4255-4271. https://doi.org/10.1098/rsta.2009.0127

Berk, R., Brown, L., Buja, A., Zhang, K., & Zhao, L. (2013). Valid post-selection inference. Annals of Statistics, 41(2), 802-837. https://doi.org/10.1214/12-AOS1077

Berk, R. A., & Freedman, D. A. (2003). Statistical assumptions as empirical commitments. In T. G. Blomberg & S. Cohen (Eds.), Law, punishment and social control: Essays in Honor of Sheldon Messinger (2nd ed., pp. 235-254). Aldine de Gruyer.

Blocker, A. W., & Meng, X. L. (2013). The potential and perils of preprocessing: Building new foundations. Bernouilli, 19(4), 1176–1211. https://doi.org/10.3150/13-BEJSP16

Bond, T. N., & Lang, K. (2019). The sad truth about happiness scales. Journal of Political Economy, 127(4), 1629-1640. https://doi.org/10.1086/701679

Borenstein, M., Hedges, L. V., Higgins, J. P. T., & Rothstein, H. R. (2009). Introduction to meta-analysis. John Wiley & Sons. https://doi.org/10.1002/9780470743386  

Borgman, C. L. (2019). The lives and after lives of data. Harvard Data Science Review, 1(1). https://doi.org/10.1162/99608f92.9a36bdb6

Braun, H. I. (1989, August 6-11). Analysis of retrospective ascertainment data in a legal setting [Paper presentation]. 1989 Annual Meeting of the American Statistical Association, American Statistical Association, Washington, DC, United States.

Braun, H. I. (1990). Data in the social sciences: It’s time for some respect [Unpublished report]. Educational Testing Service.

Braun, H. I. (2015). The value in value-added depends on the ecology. Educational Researcher, 44(2), 127-131. https://doi.org/10.3102%2F0013189X15576341

Braun, H. I., Jenkins, F., & Chaney, B. (2017). Value-added evaluation of teacher preparation programs: Sensitivity of rankings to model specification. Center for the Study of Testing, Evaluation, and Education Policy- Boston College.

Braun, H. I., Kirsch, I., & Yamamoto, K. (2011). An experimental study of the effects of monetary incentives on performance on the 12th grade NAEP reading assessment. Teachers College Record, 113(11), 2309-2344.

Braun, H. I., & Wainer, H. (2007). Value-added modeling. In C. R. Rao & S. Sinharay (Eds.), Handbook of Statistics (Vol. 27, (pp. 867-892). Elsevier Science. https://doi.org/10.1080/09332480.2011.10739845

Breiman, L. (2001). Statistical modeling: The two cultures. Statistical Science, 16, 199–231. https://doi.org/10.1214/ss/1009213726

Carnegie, N., Harada, M., & Hill, J. (2016). Assessing sensitivity to unmeasured confounding using a simulated potential confounder. Journal of Research on Educational Effectiveness, 9(3), 395-420. https://doi.org/10.1080/19345747.2015.1078862

Chetty, R., Friedman, J. N., Hilger, N., Saez, E., Schanzenbach, D. W., & Yagan, D. (2011). How does your kindergarten classroom affect your earnings? Evidence from Project Star. The Quarterly Journal of Economics, 126(4), 1593-1660. https://doi.org/10.1093/qje/qjr041

Darling-Hammond, L., Amrein-Beardsley, A., Haertel, E., & Rothstein, R. (2012). Evaluating teacher evaluation. Phi Delta Kappan, 93(6), 8-15. https://doi.org/10.1177/003172171209300603

Dearing, D., & Zachrisson, H. D. (2019). Taking selection seriously in correlational studies of child development: A call for sensitivity analyses. Child Development Perspectives, 13(4), 267-273. https://doi.org/10.1111/cdep.12343

Diaconu, D. V. (2012). Modeling science achievement differences between single-sex and coeducational schools: analyses from Hong Kong, SAR and New Zealand from TIMSS  1995, 1999, and 2003. [Unpublished doctoral dissertation]. Boston College.

Donoho, D. (2017). 50 years of data science. Journal of Computational and Graphical Statistics, 26(4), 745-766. https://doi.org/10.1080/10618600.2017.1384734

Freedman, D. A. (2010). Statistical models and causal inference: A dialogue with the social sciences. Cambridge University Press. https://doi.org/10.1017/CBO9780511815874

Ginsberg, A. L., Noel, J., & Plisko, V. W. (1988). Lessons from the Wall Chart. Education Evaluation and Policy Analysis, 10(1), 1-12. https://doi.org/10.2307/1163860

Goldhaber, D., Holden, K. L., & Grout, C. (2019). Errors in administrative education data: A cautionary tale. Educational Researcher, 48(3), 179-182. https://doi.org/10.3102/0013189X19837598

Hargreaves, A., & Braun, H. I. (2012). Leading for All: A research report of the development, design, implementation and impact of Ontario’s “Essential for Some, Good for All” initiative. Council of Ontario Directors of Education. https://doi.org/10.1108/JPCC-06-2019-0013

Harris, D. (2009). Would accountability based on teacher value-added be smart policy? An examination of the statistical properties and policy alternatives. Education Finance and Policy, 4(4), 319-350.  https://doi.org/10.1162/edfp.2009.4.4.319

Hedges, L. (1988). The meta-analysis of test validity: Some new approaches. In H. Wainer & H. I. Braun (Eds.), Test validity (pp. 191-212). Lawrence Erlbaum Assoc.

Holland, P. W., & Wainer, H. (1990). Sources of uncertainty often ignored in adjusting state mean SAT scores for differential participation rates: The rules of the game. Applied Measurement in Education, 3(2), 167-184. https://doi.org/10.1207/s15324818ame0302_3

Ioannidis, J. P. A. (2005). Why most published research findings are false. PLoS Medicine, 2(8), 697-701. https://doi.org/10.1371/journal.pmed.0020124

Jankowski, N., & Marshall, D. W. (2017). Degrees that matter: Moving higher education to a learning systems paradigm. Stylus Publishing, LLC.

Kalbfleisch, J. D., & Lawless, J. F.  (1989). Inference based on retrospective ascertainment: An analysis of data in transfusion related AIDS. Journal of the American Statistical Association, 84(406), 360-372. https://doi.org/10.2307/2289919

Keller, S. A., Shipp, S. S., Schroeder, A. D., & Korkmaz, G. (2020). Doing data science: A framework and case study. Harvard Data Science Review. https://doi.org/10.1162/99608f92.2d83f7f5

Kennet, R. S., & Redman, T. C. (2019). The real work of data science: Turning data into information, better decisions, and stronger organizations. Wiley. https://doi.org/10.1002/9781119570790

Leonelli, S. (2019). Data governance is key to interpretation: Reconceptualizing data in data science. Harvard Data Science Review. https://doi.org/10.1162/99608f92.17405bb6

Lockwood, J., & McCaffrey, D. (2007). Controlling for individual heterogeneity in longitudinal models, with applications to student achievement. Electronic Journal of Statistics, 1, 223-252. https://doi.org/10.1214/07-EJS057

Maronna, R. A. (2018). Robust statistics: Theory and methods (2nd ed.). Wiley. https://doi.org/10.1002/9781119214656

Mayo, D. G., & Cox, D. R. (2010). Frequentist statistics as a theory of inductive inference. In D. G. Mayo & A. Spanos (Eds.), Error and Inference: Recent exchanges on experimental reasoning, reliability, and the objectivity and rationality of science. Cambridge University Press.

McCaffrey, D. F., & Culpepper, S. A. (2021). Introduction to JEBS special issue on NAEP linked aggregate scores. Journal of Educational and Behavioral Statistics, 46(2), 135-137. https://doi.org/10.3102%2F10769986211001480

Meng, X. L. (2018). Statistical paradises and paradoxes in big data (I): Law of large populations, big data paradox, and the 2016 US presidential election. The Annals of Applied Statistics, 12(2), 685-726. https://doi.org/10.1214/18-AOAS1161SF

Montgomery, M. R., Richards, T., & Braun, H. I. (1986). Child health, breastfeeding and survival in Malaysia: A random effects logit approach. Journal of the American Statistical Association, 81(394), 297-309. https://doi.org/10.1080/01621459.1986.10478273

Nye, B., Hedges, L. V., & Konstantopoulos, S. (2001). The long-term effects of small classes in early grades: Lasting benefits in mathematics achievement at grade nine. Journal of Experimental Education, 69, 245-257. https://doi.org/10.1080/00220970109599487 

Organisation for Economic Co-operation and Development. (2013). Survey of adult skills technical report (2nd ed.). OECD Publishing.

Oster, E. (2019). Unobservable selection and coefficient stability: Theory and evidence. Journal of Business & Economic Statistics, 37, 187–204. https://doi.org/10.1080/07350015.2016.1227711

Pearman II, P. A., Springer, M. P., Lipsey, M., Lachowicz, M., Swain, W., & Farran, D. (2020). Teachers, schools, and pre-K effect persistence: An examination of the sustaining environment hypothesis. Journal of Research on Educational Effectiveness, 13(4), 547-573. https://doi.org/10.1080/19345747.2020.1749740

Provasniak, S. (2021). Process data, the new frontier for assessment development: rich new soil or a quixotic quest? Large-scale Assessments in Education, 9(1). https://doi.org/10.1186/s40536-020-00092-z

Reardon, S. F., Kalogrides, D., & Ho, A. (2021). Validation methods for aggregate-level test scale linking: A case study mapping school district test score distributions to a common scale. Journal of Educational and Behavioral Statistics, 46(2), 138-167. https://doi.org/10.3102/1076998619874089

Reardon, S. F., & Raudenbush, S. W. (2009). Assumptions of value-added models for estimating school effects. Education Finance and Policy, 4(4), 492-519. https://doi.org/10.1162/edfp.2009.4.4.492 

Rosenbaum, P. R. (1989). Sensitivity analysis for certain permutation inferences in matched observational studies. Biometrika, 74, 13- 26. https://doi.org/10.1093/biomet/74.1.13 

Rosenbaum, P. R. (2002). Observational studies (2nd ed.). Springer-Verlag. https://doi.org/10.1007/978-1-4757-3692-2 

Rosenbaum, P. R., & Rubin, D. B. (1983). Assessing sensitivity to an unobserved bivariate covariate in an observational study with binary outcome. Journal of the Royal Statistical Society: Series B (Methodological), 45(2), 212-218. https://doi.org/10.1111/j.2517-6161.1983.tb01242.x

Rothstein, J. (2009). Student sorting and bias in value-added estimation: Selection on observables and unobservables. Education Finance and Policy, 4(4), 537-571. https://doi.org/10.1162/edfp.2009.4.4.537

Russell, M., Ludlow, L., & O’Dwyer, L. (2019). Preparing the next generation of educational measurement specialists: A call for programs with an integrated scope and sequence. Educational Measurement: Issues and Practice, 38(4), 78-86. https://doi.org/10.1111/emip.12285

Sanders, W., Saxton, A., & Horn, S. (1997). The Tennessee value-added assessment system: A quantitative, outcome-based approach to educational assessment. In J. Millman (Ed.), Grading teachers, grading schools: Is student achievement a valid evaluation measure? (pp. 137-162).  Corwin Press. https://doi.org/10.3102/10769986029001037 

Shavelson, R. J., & Towne, L. (Eds.) (2002). Scientific research in education. National Academies Press. https://doi.org/10.17226/10236

Singer, J. D., Braun, H. I., & Chudowski, N. (Eds.) (2018). International education assessments: Cautions, conundrums, and common sense. National Academy of Education. https://doi.org/10.31094/2018/1

Singer, J. D., & Willett, J. B. (1990). Improving the teaching of applied statistics: Putting the data back into data analysis. The American Statistician, 44(3), 223-230. https://doi.org/10.2307/2685342

Stark, P. B., & Saltelli, A. (2018, August). Cargo-cult statistics and scientific crisis. Significance, 15(4), 40-43. https://doi.org/10.1111/j.1740-9713.2018.01174.x

Steelman, L. C., & Powell, B. (1985). Appraising the implications of the SAT for education policy. Phi Delta Kappan, 67, 603-606.

Tibshirani, R. J., Taylor, J., Lockhart, R., & Tibshirani, R. (2016). Exact postselection inference for sequential regression procedures. Journal of the American Statistical Association, 111(514), 600-620. https://doi.org/10.1080/01621459.2015.1108848

Tipton, E. (2014). How generalizable is your experiment? An index for comparing experimental samples and populations. Journal of Educational and Behavioral Statistics, 39(6), 478-501. https://doi.org/10.3102/1076998614558486 

Tipton, E., & Olsen, R. (2018). A review of statistical methods for generalizing from evaluations of educational interventions. Educational Researcher, 47(8), 516-524. https://doi.org/10.3102/0013189X18781522 

Tukey, J. W. (1962). The future of data analysis. The Annals of Mathematical Statistics, 33(1), 1-67. https://doi.org/10.1214/aoms/1177704711

Tukey, J. W. (1976). Exploratory data analysis. Addison-Wesley.

van der Sluijs, J., Craye, M., Funtowicz, S., Kloprogge, P., Ravetz, J., & Risbey, J. (2005). Combining quantitative and qualitative measures of uncertainty in model based environmental assessment: The NUSAP System. Risk Analysis, 25(2), 481-492. https://doi.org/10.1111/j.1539-6924.2005.00604.x

von Davier, M., & Sinharay, S. (2014).  Analytics in international large scale assessments:  item response theory and population models.  In L. Rutkowski, M. von Davier, & D. Rutkowski (Eds.), Handbook of international large-scale assessment: Background, technical issues, and methods of data analysis (pp. 155-174). CRC Press.

Wainer, H. (1986a). The SAT as a social indicator. A pretty bad idea. In H. Wainer (Ed.) Drawing Inferences from self-selected samples (pp. 7-21). Springer-Verlag. https://doi.org/10.1007/978-1-4612-4976-4_2

Wainer, H. (1986b). Five pitfalls encountered when trying to compare states on their SAT scores. Journal of Educational Statistics, 11, 239-244. https://doi.org/10.1111/j.1745-3984.1986.tb00235.x  

Walsh, M. E., Madaus, G. F., Raczek, A. E., Dearing, E., Foley, C., An, C., Lee-St. John, T. L., & Beaton, A. E. (2014). A new model for student support in high-poverty urban elementary schools: Effects on elementary and middle school academic outcomes. American Educational Research Journal, 51(4), 704-737. https://doi.org/10.3102/0002831214541669 

Wolf, R., Morrison, J., Inns, A., Slavin, R., & Risman, K. (2020). Average effect sizes in developer-commissioned and independent evaluations. Journal of Research on Educational Effectiveness, 13(2), 428-447. https://doi.org/10.1080/19345747.2020.1726537

...