logo logo International Journal of Educational Methodology

IJEM is a leading, peer-reviewed, open access, research journal that provides an online forum for studies in education, by and for scholars and practitioners, worldwide.

Subscribe to

Receive Email Alerts

for special events, calls for papers, and professional development opportunities.


Publisher (HQ)

Eurasian Society of Educational Research
College House, 2nd Floor 17 King Edwards Road, Ruislip, London, UK. HA4 7AE
College House, 2nd Floor 17 King Edwards Road, Ruislip, London, UK. HA4 7AE
college readiness longitudinal database machine learning multiple imputation synthetic data

Synthetic Longitudinal Education Database: Linking National Datasets for K-16 Education and College Readiness

Jaekyung Lee , Joseph Jaeger

What are missing in the U.S. education policy of “college for all” are supporting data and indicators on K-16 education pathways, i.e, how.


What are missing in the U.S. education policy of “college for all” are supporting data and indicators on K-16 education pathways, i.e, how well all students get ready and stay on track from kindergarten through college. This study creates synthetic national longitudinal education database that helps track and support students’ educational pathways by combining two nationally-representative U.S. sample datasets: Early Childhood Longitudinal Study- Kindergarten (ECLS-K; Kindergarten through 8th grade) and National Education Longitudinal Study (NELS; 8th grade through age 25). The merge of these national datasets, linked together via statistical matching and imputation techniques, can help bridge the gap between elementary and secondary/postsecondary education data/research silos. Using this synthetic K-16 education longitudinal database, this study applies machine learning data analytics in search of college readiness early indicators among kindergarten students. It shows the utilities and limitations of linking preexisting national datasets to impute education pathways and assess college readiness. It discusses implications for developing more holistic and equitable educational assessment system in support of K-16 education longitudinal database.

Keywords: College readiness, longitudinal database, machine learning, multiple imputation, synthetic data.

cloud_download PDF
Article Metrics



ACT. (2010). Mind the gaps: How college readiness narrows achievement gaps for college success. https://bit.ly/3mJm8QJ

Allensworth, E. M., & Easton, J. Q. (2007). What matters for staying on-track and graduating in Chicago public high schools: A close look at course grades, failures, and attendance in the freshman year. Consortium on Chicago School Research. https://eric.ed.gov/?id=ED498350

Amo, L., & Lee, J. (2013). Review of “SAT wars: The case for test-optional college admissions”. The Review of Higher Education, 36(3), 405–406. https://doi.org/10.1353/rhe.2013.0031

Anderson, L., & Fulton, M. (2015). Multiple measures for college readiness. Education Commission of the States. https://www.ecs.org/clearinghouse/01/17/37/11737.pdf.

Berger, A., Turk-Bicakci, L., Garet, M., Knudson, J., & Hoshen, G. (2013). Early college, early success: early college high school initiative impact study. American Institutes for Research. https://eric.ed.gov/?id=ED577243

Bhopal, K. (2017). Addressing racial inequalities in higher education: equity, inclusion and social justice. Ethnic and Racial Studies, 40(13), 2293–2299. https://doi.org/10.1080/01419870.2017.1344267

Carpenter, J. R., & Kenward, M. G. (2013). Multiple imputation and its application. Wiley. https://doi.org/10.1002/9781119942283

Conley, D. T. (2005). College knowledge: What it really takes for students to succeed and what we can do to get them ready. Jossey-Bass.

Data Qualtiy Campaign. (2014). Data for action 2014: Paving the path to success. https://bit.ly/3mHTqQd

DiPrete, T. A., & Buchmann, C. (2013). The rise of women: the growing gender gap in education and what it means for American schools. Russell Sage Foundation.

D'Orazio, M., Di Zio, M., & Scanu, M. (2006). Statistical matching: Theory and practice. John Wiley & Sons. https://doi.org/10.1002/0470023554

Dougherty, C., & Mellor, L. (2010). Preparing students for advanced placement: It’s a P-12 issue. In P. Sadler, R. Tai, K. Klopfenstein & G. Sonnert (Eds.), Promise and impact of the advanced placement program. Harvard Education Press.

Eccles, J. S., Lord, S., & Midgley, C. (1991). What are we doing to early adolescents? The impact of educational contexts on early adolescents. American Journal of Education, 99(4), 521-542. https://doi.org/10.1086/443996

Ellwood, D. T., & Kane, T. J. (2000). Who is getting a college education? Family background and the growing gaps in enrollment. In S. Danziger & J. Waldfogel (Eds.). Securing the future (pp. 283-324). Russell Sage Foundation.

Feldman, A. F., & Matjasko, J. L. (2005). The role of school-based extracurricular activities in adolescent development: A comprehensive review and future directions. Review of Educational Research 75(2), 159–210. https://doi.org/10.3102/00346543075002159

Finn, J. D., Gerber, S. B., Achilles, C. M., & Boyd-Zaharias, J. (2001). The enduring effects of small classes. Teachers College Record, 103(2), 145-183. https://doi.org/10.1111/0161-4681.00112

Finn, J. D., Gerber, S. B., & Wang, M. C. (2002). Course offerings, course requirements, and course taking in mathematics. Journal of Curriculum and Supervision, 14(4), 336-366. https://eric.ed.gov/?id=EJ648747

Friedman, J., Hastie, T., & Tibshirani, R. (2008). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1), 1-22. https://doi.org/10.18637/jss.v033.i01

Froiland, J. M., & Davison, M. L. (2016). The longitudinal influences of peers, parents, motivation, and mathematics course-taking on high school math achievement. Learning and Individual Differences, 50, 252–259. https://doi.org/10.1016/j.lindif.2016.07.012

Geron, A. (2017). Hands-on machine learning with Scikit-learn & tensor flow. O’Relly.

Glancy, E., Fulton, M., Anderson, L., Zinth, J., Millard, M., & Delander, B. (2014). Blueprint for college readiness. Education Commission of the States. http://www.ecs.org/docs/BlueprintforCollegeReadiness.pdf.

Gutman, L. M., Sameroff, A. J., & Cole, R. (2003). Academic growth curve trajectories from 1st grade to 12th grade: effects of multiple social risk factors and preschool child factors. Developmental Psychology, 39(4), 777–790. https://doi.org/10.1037/0012-1649.39.4.777

Hair, E., Halle, T., Terry-Humen, E., Lavelle, B., & Calkins, J. (2006). Children’s school readiness in the ECLS-K: Predictions to academic, health, and social outcomes in first grade. Early Childhood Research Quarterly, 21(4), 431–454. https://doi.org/10.1016/j.ecresq.2006.09.005

Hauser, R., & Koenig, J. A. (2011). High school dropout, graduation, and completion rates: Better data, better measures, better decisions. National Academies Press.

Heckman, J., & Lochner, L. (2000). Rethinking education and training policy: Understanding the sources of skill formation in a modern economy. In S. Danziger & J. Waldfogel (Eds.), Securing the future (pp. 47-83). Russell Sage Foundation.

Hedges, L. V., & Nowell, A. (1995). Sex differences in mental test scores, variability, and numbers of high-scoring individuals. Science, 269(5220), 41–45. https://doi.org/10.1126/science.7604277

Henry, D. A., Betancur Cortés, L., & Votruba-Drzal, E. (2020). Black–white achievement gaps differ by family socioeconomic status from early childhood through early adolescence. Journal of Educational Psychology, 112(8), 1471–1489. https://doi.org/10.1037/edu0000439

Honaker, J., King, G., & Blackwell, M. (2011). Amelia II: A program for missing data. Journal of Statistical Software, 45(7), 1–47. https://doi.org/10.18637/jss.v045.i07

Jack, A. A. (2014). Culture shock revisited: The social and cultural contingencies to class marginality. Sociological Forum, 29(2), 453–475. https://doi.org/10.1111/socf.12092

Jiao, H., & Lissitz, R. W. (2016) (Eds.) The next generation of testing: common core standards, smarter-balanced, PARCC, and the nationwide testing movement. Information Age Publishing.

King, G., Honaker, J., Joseph, A., & Scheve, K. (2001). Analyzing incomplete political science data: an alternative algorithm for multiple imputation. American Political Science Review, 95(1), 49-69. https://doi.org/10.1017/S0003055401000235

Kirst, M. W., & Venezia, A. (2004). (Eds.) From high school to college: Improving opportunities for success in postsecondary education. Jossey-Bass. https://doi.org/10.1037/e565212006-013

Ladd, H. F. (2012). Education and poverty: Confronting the evidence. Journal of Policy Analysis and Management, 31(2), 203–227. https://doi.org/10.1002/pam.21615

Lee, J. (2012). College for all: gaps between desirable and actual P-12 math achievement trajectories for college readiness. Educational Researcher, 41(2), 43-55. https://doi.org/10.3102/0013189X11432746

Lee, J. (2016). The anatomy of achievement gaps: Why and how American education is losing (but can still win) the war on underachievement. Oxford University Press. https://doi.org/10.1093/acprof:oso/9780190217648.001.0001

Lee, J. (2020). What’s missing from the nation’s report card. Phi Delta Kappan, 102(4), 46-51. https://doi.org/10.1177/0031721720978067

Lee, J., Kim, N., Cobanoglu, A., & O’Connor, M. (2019). Moving to educational accountability system 2.0: Socioemotional learning standards and protective environment for whole child development. The Rockefeller Institute of the Government. https://eric.ed.gov/?id=ED605689  

Lee, J., & Lee, M. (2020). Is 'whole child' education obsolete? Public school principals' educational goal priorities in the era of accountability. Educational Administration Quarterly, 56(5), 856-884. https://doi.org/10.1177/0013161X20909871

Lee, V. E., & Burkam, D. T. (2003). Dropping out of high School: The role of school organization and structure. American Educational Research Journal, 40(2), 353–393. https://doi.org/10.3102/00028312040002353

Little, R. J. A., & Rubin, D. B. (2002). Statistical analysis with missing data (2nd ed.). John Wiley & Sons. https://doi.org/10.1002/9781119013563

MacIver, D. J., & Epstein, J. L. (1991). Responsive practices in the middle grades: Teacher teams, advisory groups, remedial instruction, and school transition programs. American Journal of Education, 99(4), 587-622. https://doi.org/10.1086/443999

Martin, C., Sargrad, S., & Batel, S. (2016). Making the grade: A 50-state analysis of school accountability systems. Center for American Progress. https://eric.ed.gov/?id=ED567858

National Governors Association. (2007). Principles of federal preschool-college (P-16) alignment. Stark Education Partnership.

National Research Council. (2012). Education for life and work. The National Academies Press.

Neild, R. C., Balfanz, R., & Herzog, L. (2007). An early warning system. Educational Leadership, 65(2), 28-33.

O’Connell, M. E., Boat, T., & Warner, K. E. (2009). Preventing mental, emotional, and behavioral disorders among young people: Progress and possibilities. Committee on the Prevention of Mental Disorders and Substance Abuse Among Children, Youth, and Young Adults: Research Advances and Promising Interventions. The National Academies Press.

Owens, A. (2010). Neighborhoods and schools as competing and reinforcing contexts for educational attainment. Sociology of Education, 83(4), 287–311. https://doi.org/10.1177/0038040710383519

Polidano, C., Hanel, B., & Buddelmeyer, H. (2013). Explaining the socio-economic status school completion gap. Education Economics, 21(3), 230–247. https://doi.org/10.1080/09645292.2013.789482

Rau, W., & Durand, A. (2000). The academic ethic and college grades: Does hard work help students to ‘make the grade’? Sociology of Education, 73, 19-38. https://doi.org/10.2307/2673197

Rosen, R., Byndloss, D. C., Parise, L., Alterman, E., & Dixon, M. (2020). Bridging the school-to-work divide: Interim implementation and impact findings from New York City’s P-TECH 9-14 schools. MDRC. https://eric.ed.gov/?id=ED605308

Rubin, D. B. (1987) Multiple imputation for nonresponse in surveys. John Wiley & Sons Inc. https://doi.org/10.1002/9780470316696

Sander, W. (2006). Educational attainment and residential location. Education and Urban Society, 38(3), 307–326. https://doi.org/10.1177/0013124506286944

Schweinhart, L. J., & Weikart, D. P. (1998). High/ scope perry preschool program effects at age twenty-seven. In J. Crane (Ed), Social programs that work (pp. 148-162). Russell Sage Foundation.

Takahashi, M. (2017). Statistical inference in missing data by MCMC and non-MCMC multiple imputation algorithms: assessing the effects of between-imputation iterations. Data Science Journal, 16, 1-17. http://doi.org/10.5334/dsj-2017-037

Templ, M., Alfons, A., Kowarik, A., & Prantner B. (2016). VIM: Visualization and imputation of missing values. R package version 4.6.0. https://CRAN.R-project.org/package=VIM.

van Buuren, S., Brand, J. P. L., Groothius-Oudshoorn, C. G. M., & Rubin, D. B. (2006). Fully conditional specification in multivariate imputation. Journal of Statistical Computation and Simulation, 76(12), 1049–1064. https://doi.org/10.1080/10629360600810434

Young, A., Johnson, G., Hawthrone, M., & Pugh, J. (2011). Cultural predictors of academic motivation and achievement: A self-deterministic approach. College Student Journal, 45(1), 151–163.

Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society, 67(2), 301–320. https://doi.org/10.1111/j.1467-9868.2005.00503.x