Kelley’s Discrimination Index (DI) is a simple and robust, classical non-parametric short-cut to estimate the item discrimination power (IDP) in.
- Pub. date: May 15, 2020
- Pages: 237 - 258
- 1020 Downloads
- 1639 Views
- 6 Citations
Kelley’s Discrimination Index (DI) is a simple and robust, classical non-parametric short-cut to estimate the item discrimination power (IDP) in the practical educational settings. Unlike item–total correlation, DI can reach the ultimate values of +1 and ‒1, and it is stable against the outliers. Because of the computational easiness, DI is specifically suitable for the rough estimation where the sophisticated tools for item analysis such as IRT modelling are not available as is usual, for example, in the classroom testing. Unlike most of the other traditional indices for IDP, DI uses only the extreme cases of the ordered dataset in the estimation. One deficiency of DI is that it suits only for dichotomous datasets. This article generalizes DI to allow polytomous dataset and flexible cut-offs for selecting the extreme cases. A new algorithm based on the concept of the characteristic vector of the item is introduced to compute the generalized DI (GDI). A new visual method for item analysis, the cut-off curve, is introduced based on the procedure called exhaustive splitting.
kelleys discrimination index item parameter item total correlation item analysis classical test theory
Keywords: Kelley’s discrimination index, item parameter, item–total correlation, item analysis, classical test theory.
References
Aslan, S., & Aybek, B. (2020). Testing the effectiveness of interdisciplinary curriculum-based multicultural education on tolerance and critical thinking skill. International Journal of Educational Methodology, 6(1), 43–55. https://doi.org/10.12973/ijem.6.1.43.
Balov, N., & Marchenko, Y. (2016). In the spotlight: Bayesian IRT–4PL model. Stata News, 31(1), (2016 quarter 1). https://www.stata.com/stata-news/news31-1/bayesian-irt/.
Barton, M. A., & Lord, F. M. (1981). An upper asymptote for the three-parametric logistic item-response model. ETS Research ReportRR-81-20. Educational Testing Service.
Batanero, C. D. (2007). Suitability of teaching Bayesian inference in data analysis courses directed to psychologists. [Unpublished doctoral dissertation]. University of Granada Spain.
Bazaldua, D. A. L., Lee, Y.-S., Keller, B., & Fellers, L. (2017). Assessing the performance of classical test theory item discrimination estimators in Monte Carlo simulations. Asia Pacific Education Review, 18(4), 585–598. https://doi.org/10.1007/s12564-017-9507-4.
Brennan, R. L. (1972). A generalized upper-lower item discrimination index. Educational and Psychological Measurement, 32(2), 289–303. https://doi.org/10.1177/001316447203200206.
Cechova, I., Neubauer, J., & Sedlacik, M. (2014). Computer‐adaptive testing: Item analysis and statistics for effective testing. In R. Ørngreen & K. Tweddell Levinsen, Proceedings of the 13th European conference on e‐learning ECEL‐2014 (pp. 106–112). Aalborg University Copenhagen, Denmark 30‐31 October 2014. Academic Conferences and Publishing International Limited.
Cox, N. R. (1974). Estimation of the correlation between a continuous and a discrete variable. Biometrics, 30(1), 171–178. https://doi.org/10.2307/2529626.
Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16(3), 297–334. http://dx.doi.org/10.1007/BF02310555.
Cureton, E. E. (1966a). Simplified formulas for item analysis. Journal of Educational Measurement, 3(2), 187–189. https://doi.org/10.1111/j.1745-3984.1966.tb00879.x.
Cureton, E. E. (1966b). Corrected item–test correlations. Psychometrika, 31(1), 93–96. https://doi.org/10.1007/BF02289461.
D’Agostino, R. B., & Cureton, E. E. (1975). The 27 percent rule revisited. Educational and Psychological Measurement, 19(1), 47–50. https://doi.org/10.1177/001316447503500105.
Drasgow, F. (1986). Polychoric and polyserial correlations. In S. Kotz & N. L. Johnson (Eds.), Encyclopedia of statistical sciences. Vol 7 (pp. 68–74). John Wiley.
Dunn, T. J., Baguley, T., & Brunsden, V. (2013). From alpha to omega: A practical solution to the pervasive problem of internal consistency estimation. British Journal of Psychology, 105(3), 399–412. https://doi.org/10.1111/bjop.12046.
Ebel, R. L. (1954a). How an examination service helps college teachers to give better tests. Proceedings of the 1953 invitational conference on testing problems. Educational Testing Service.
Ebel, R. L. (1954b). Procedures for the analysis of classroom tests. Educational and Psychological Measurement 14(2), 352–353. https://doi.org/10.1177/001316445401400215.
Ebel, R. L. (1967). The relation of item discrimination to test reliability. Journal of Educational Measurement 4(3), 125–128. https://doi.org/10.1111/j.1745-3984.1967.tb00579.x.
Esendemir, O., & Bindak, R. (2019). Adaptation of the test developed to measure mathematical knowledge of teaching geometry in Turkey. International Journal of Educational Methodology, 5(4), 547–565. https://doi.org/10.12973/ijem.5.4.547.
ETS (1960). Short-cut statistics for teacher-made tests. Educational Testing Service.
ETS (2020). Glossary of standardized testing terms. Educational Testing Service. https://www.ets.org/understanding_testing/glossary/.
Feldt, L. S. (1963). Note on use of extreme criterion groups in item discrimination analysis. Psychometrika, 28(1), 97–104. https://doi.org/10.1007/BF02289553.
FINEEC (2018). National assessment of learning outcomes in mathematics at grade 9 in 2004. Unpublished dataset opened for the re-analysis 18.2.2018. Finnish National Education Evaluation Centre.
Flanagan, J. C. (1937). A proposed procedure for increasing the efficiency of objective tests. Journal of Educational Psychology, 28(1), 17–21. http://dx.doi.org/10.1037/h0057430.
Forlano, G., & Pinter, R. (1941). Selection of upper and lower groups for item validation. Journal of Educational Psychology, 32(7), 544–549. http://dx.doi.org/10.1037/h0058501.
Goodman, L. S., & Kruskal, W. H. (1959). Measures of association for cross classification. II: Further discussion and references. Journal of the American Statistical Association, 54, 123–163. https:/doi.org/10.2307/2282143.
Graham, J M. (2006). Congeneric and (essentially) tau-equivalent estimates for score reliability. What they are and how to use them. Educational and Psychological Measurement, 66(6), 930‒944. http://dx.doi.org/10.1177/0013164406288165.
Green, S. B., & Yang, Y. (2009). Commentary on coefficient alpha: A cautionary tale. Psychometrika, 74(1), 121‒135. http://dx.doi.org/10.1007/s11336-008-9098-4.
Gulliksen, H. (1950). Theory of mental tests. Lawrence Erlbaum Associates Publishers.
Guttman, L. (1945). A basis for analyzing test-retest reliability. Psychometrika, 10(4), 255‒282. https://doi.org/10.1007/BF02288892.
Guttman, L. (1950). The basis for scalogram analysis. In S. A. Stouffer, L. Guttman, E. A. Suchman, P. F. Lazarsfield, S. A. Star, & J. A. Clausen (Eds.), Measurement and prediction. The American Soldier, Vol IV. Wiley.
Harris, C. W., & Wilcox, R. R. (1980). Brennan's B is Peirce's Theta. Educational and Psychological Measurement, 40(2), 307–311. https://doi.org/10.1177/001316448004000204.
Henrysson, S. (1963). Correction of item–total correlations in item analysis. Psychometrika, 28(2), 211–218. http://dx.doi.org/10.1007/BF02289618.
Hogan, T. P., Benjamin, A., & Brezinski, K. L. (2000). Reliability methods: A note on the frequency of use of various types. Educational and Psychological Measurement, 60(4), 523–531. https://doi.org/10.1177/00131640021970691.
IBM. (2017). IBM SPSS Statistics 25 Algorithms. IBM. ftp://public.dhe.ibm.com/software/ analytics/spss/documentation/statistics/25.0/en/client/Manuals/IBM_SPSS_Statistics_Algorithms.pdf
Johnston, A. P. (1951). Notes on a suggested index of item validity: The U-L index. Journal of Educational Psychology, 42(8), 499–504. https://doi.org/10.1037/h0060855.
Kelley, T., Ebel, R., & Linacre, J. M. (2002). Item discrimination indices. Rasch Measurement Transactions, 16(3), 883–884. https://www.rasch.org/rmt/rmt163a.htm.
Kelley, T. L. (1939). The selection of upper and lower groups for the validation of test items. Journal of Educational Psychology, 30(1), 17–24. http://dx.doi.org/10.1037/h0057123.
Kuder, G. F., & Richardson, M. W. (1937). The theory of the estimation of test reliability. Psychometrika, 2(3), 151–160. http://dx.doi.org/10.1007/BF02288391.
Linacre, J. M., & Wright, B. D. (1994). Chi-square fit statistics. Rasch Measurement Transactions, 8(2), 350. https://www.rasch.org/rmt/rmt82a.htm.
Linacre, J. M., & Wright, B. D. (1996). Guttman-style item location maps. Rasch Measurement Transactions, 10(2), 492–493. https://www.rasch.org/rmt/rmt102h.htm.
Liu, F. (2008). Comparison of several popular discrimination indices based on different criteria and their application in item analysis. University of Georgia. https://getd.libs.uga.edu/pdfs/liu_fu_200808_ma.pdf.
Loken, E., & Rulison, K. L. (2010). Estimation of a four-parameter item response theory model. British Journal of Mathematical and Statistical Psychology 63(3), 509–525. https://doi.org/10.1348/000711009X474502.
Long, J. A., & Sandiford, P. (1935). The validation of test items (Bulletin No. 3). Department of Educational Research University of Toronto.
Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Addison–Wesley Publishing Company.
Macdonald, P., & Paunonen, S. V. (2002). A Monte Carlo comparison of item and person statistics based on item response theory versus classical test theory. Educational and Psychological Measurement, 62(6), 921–943. https://doi.org/10.1177/0013164402238082.
Mehrens, W. A., & Lehmann, I. J. (1991). Measurement and evaluation in education and psychology (4th ed.). Harcourt Brace College Publishers.
Metsämuuronen, J. (2016). Item–total correlation as the cause for the underestimation of the alpha estimate for the reliability of the scale. GJRA - Global Journal for Research Analysis, 5(1), 471–477. https://www.worldwidejournals.com/global-journal-for-research-analysis-GJRA/ fileview/November_2016_1478701072__159.pdf.
Metsämuuronen, J. (2017). Essentials of research methods in human sciences. Vol 1: Elementary basics. SAGE Publications.
Metsämuuronen, J. (2020). Somers’ D as an alternative for the item–test and item–rest correlation coefficients in the educational measurement settings. International Journal of Educational Methodology, 6(1), 207–221. https://doi.org/10.12973/ijem.6.1.207.
Moses, T. (2017). A review of developments and applications in item analysis. In R. Bennett & M. von Davier (Eds.), Advancing human assessment. The methodological, psychological and policy contributions of ETS (pp. 19–46). Springer Open. https://doi.org/10.1007/978-3-319-58689-2_2.
Oosterhof, A. C. (1976). Similarity of various item discrimination indices. Journal of Educational Measurement, 13(2), 145–150. https://doi.org/10.1111/j.1745-3984.1976.tb00005.x.
Pearson, K. (1896). Mathematical contributions to the theory of evolution III. Regression, heredity, and panmixia. Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character, 187, 253–318. https://doi.org/10.1098/rsta.1896.0007.
Ross, J., & Lumsden, J. (1964). Comment on Feldt's “use of extreme groups”. Psychometrika, 29(2), 207–209. http://doi.org/10.1007/BF02289701.
Ross, J., & Weitzman, R. A. (1964). The twenty-seven per cent rule. Annals of Mathematical Statistics, 35(1), 214–221. http://doi.org/10.1214/aoms/1177703745.
Rulon, P. J. (1939). A simplified procedure for determining the reliability of a test by split-halves. Harvard Educational Review, 9, 99–103.
Somers, R. H. (1962). A new asymmetric measure of association for ordinal variables. American Sociological Review, 27(6), 799–811. https://doi.org/10.2307/2090408.
Stanley, J. T. (1964). Measurement in today’s schools (4th ed). Prentice-Hall.
Stata corp. (2018). Stata manual. Stata. https://www.stata.com/manuals13/mvalpha.pdf
Tarkkonen, L. (1987). On reliability of composite scales. An essay on the measurement and the properties of the coefficients of reliability—unified approach. Statistical Research Reports 7. Finnish Statistical Society.
Tristan, L. A. (1998). The item discrimination index: does it work? Rasch Measurement Transactions, 12(1), 626. https://www.rasch.org/rmt/rmt121r.htm#Disc.
Trizano-Hermosilla, I., & Alvarado, J. M. (2016). Best alternatives to Cronbach's Alpha reliability in realistic conditions: Congeneric and asymmetrical measurements. Frontiers in Psychology, 7, 769. https://doi.org/10.3389/fpsyg.2016.00769.
Vehkalahti, K. (2000). Reliability of measurement scales. Statistical Research Reports 17. Finnish Statistical Society. https://helda.helsinki.fi/bitstream/handle/10138/21251/reliabil.pdf? sequence= 1&isAllowed=y.
Wiersma, W., & Jurs, S. G. (1990). Educational measurement and testing (2nd ed.). Allyn and Bacon.
Wolf, R. (1967). Evaluation of several formulae for correction of item-total correlations in item analysis. Journal of Educational Measurement, 4(1), 21–26. https://doi.org/10.1111/j.1745-3984.1967.tb00565.x.
Yang, Y., & Green, S.B. (2011). Coefficient alpha: A reliability coefficient for the 21st Century? Journal of Psychoeducational Assessment, 29(4), 377‒392. http://dx.doi.org/10.1177/0734282911406668.
Yi-Hsin, C., & Li, I. (2015). IA_CTT: A SAS® macro for conducting item analysis based on classical test theory. Paper CC184. https://analytics.ncsu.edu/sesug/2015/CC-184.pdf