A new index of item discrimination power (IDP), dimension-corrected Somers’ D (D2) is proposed. Somers’ D is one of the superior alternati.
- Pub. date: May 15, 2020
- Pages: 297-317
- 492 Downloads
- 1229 Views
- 10 Citations
A new index of item discrimination power (IDP), dimension-corrected Somers’ D (D2) is proposed. Somers’ D is one of the superior alternatives for item–total- (Rit) and item–rest correlation (Rir) in reflecting the real IDP with items with scales 0/1 and 0/1/2, that is, up to three categories. D also reaches the extreme value +1 and ‒1 correctly while Rit and Rir cannot reach the ultimate values in the real-life testing settings. However, when the item has four categories or more, Somers’ D underestimates IDP more than Pearson correlation. A simple correction to Somers’ D in the polytomous case seems to lead to be effective in item analysis settings. In the simulation with real-life items, D2 showed very few cases of obvious underestimation and practically no cases of obvious overestimation. With certain restrictions discussed in the article, D2 seems to be a good alternative for these classic estimators not only with dichotomous items but also with the polytomous ones. In general, the magnitudes of the estimates by D2 are higher than those by Rit, Rir, and polychoric correlation and they seem to be close of those of bi- and polyserial correlation coefficients without out-of-range values.
item analysis pearson correlation item total correlation item rest correlation somers d item discrimination power
Keywords: Item analysis, Pearson correlation, item–total correlation, item–rest correlation, Somers’ D, item discrimination power.
References
Agresti, A. (2010). Analysis of ordinal categorical data (2nd ed). Wiley.
Aslan, S., & Aybek, B. (2020). Testing the effectiveness of interdisciplinary curriculum-based multicultural education on tolerance and critical thinking skill. International Journal of Educational Methodology, 6(1), 43–55. https://doi.org/10.12973/ijem.6.1.43.
Brown, W. (1910). Some experimental results in the correlation of mental abilities. British Journal of Psychology, 3(3), 296–322. https://doi.org/10.1111/j.2044-8295.1910.tb00207.x.
Byrne, B. M. (2001). Structural equation modeling with AMOS. Basic concepts, applications, and programming. Lawrence Erlbaum Associates, Publishers.
Cox, N. R. (1974). Estimation of the correlation between a continuous and a discrete variable. Biometrics, 30(1), 171–178. https://doi.org/10.2307/2529626.
Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16(3) Sept. 297–334. https://doi.org/10.1007/BF02310555.
Cureton, E. E. (1956). Rank–biserial correlation. Psychometrika, 21(3), 287–290. https://doi.org/10.1007%2FBF02289138.
Cureton E. E. (1966). Corrected item–test correlations. Psychometrika, 31(1), 93–96. https://doi.org/10.1007/BF02289461.
Delil, A., & Ozcan, B.N.(2019). How 8th graders are assessed through tests by mathematics teachers? International Journal of Educational Methodology, 5(3), 479–488. https://doi.org/10.12973/ijem.5.3.479.
Drasgow, F. (1986). Polychoric and polyserial correlations. In S. Kotz & N. L. Johnson (Eds.), Encyclopedia of statistical sciences- Vol 7 (pp. 68–74). John Wiley.
Educational Testing Service (2020). Glossary of standardized testing terms. Educational Testing Service. https://www.ets.org/understanding_testing/glossary/
Finnish Education Evaluation Centre (2018). National assessment of learning outcomes in mathematics at grade 9 in 2002 (Unpublished dataset opened for the re-analysis 18.2.2018). Finnish National Education Evaluation Centre.
Flanagan J. C. (1937). A proposed procedure for increasing the efficiency of objective tests. Journal of Educational Psychology, 28(1), 17–21. https://doi.org/10.1037/h0057430.
Goktas, A., & Isci. O. A. (2011). Comparison of the most commonly used measures of association for doubly ordered square contingency tables via simulation. Methodological Notebooks/ Metodoloski Zvezki, 8(1), 17–37.
Goodman, L. A., & Kruskal, W. H. (1954). Measures of association for cross classifications. Journal of the American Statistical Association, 49(268), 732–764. https://doi.org/10.1080/01621459.1954.10501231.
Greiner, R. (1909). Uber das fehlersystem der kollektivmaßlehre [Of the error systemic of collectives]. Journal of Mathematics and Physics/ Zeitschift fur Mathematik und Physik, 57, 121–158, 225–260, 337–373.
Gulliksen, H. (1950). Theory of mental tests. Lawrence Erlbaum Associates, Publishers.
Guttman, L. (1945). A basis for analyzing test-retest reliability. Psychometrika, 10(4), 255‒282. https://doi.org/10.1007/BF02288892.
Guttman, L. (1950). The basis for scalogram analysis. In S. A. Stouffer, L. Guttman, E. A. Suchman, P. F. Lazarsfield, S. A. Star, & J. A. Clausen (Eds.), Measurement and prediction. Princeton University Press.
Henrysson, S. (1963). Correction of item–total correlations in item analysis. Psychometrika, 28(2), 211–218. https://doi.org/10.1007/BF02289618.
Howard K. I, & Forehand, G. A. (1962). A method for correcting item-total correlations for the effect of relevant item inclusion. Educational and Psychological Measurement, 22(4), 731–735. https://doi.org/10.1177/001316446202200407.
IBM (2011). IBM SPSS Decision trees 20. ftp://public.dhe.ibm.com/ software/analytics/spss/documentation/statistics/20.0/en/client/Manuals/IBM_SPSS_Decision_Trees.pdf
Kass, G. (1980). An exploratory technique for investigating large quantities of categorical data. Applied Statistics, 29(2), 119–127. https://doi.org/10.2307/2986296.
Kendall, M. (1938). A new measure of rank correlation. Biometrika, 30(1/2), 81–93. https://doi.org/10.2307/2332226.
Kendall, M. (1949). Rank and product–moment correlation. Biometrika, 36(1/2), 177-193. https://doi.org/10.2307/2332540.
Kuder, G. F., & Richardson, M. W. (1937). The theory of the estimation of test reliability. Psychometrika, 2(3), 151–160. https://doi.org/10.1007/BF02288391.
Li, H. (1997). A unifying expression for the maximal reliability of a linear composite. Psychometrika, 62(2), 245–249. https://doi.org/10.1007/BF02295278.
Liu, F. (2008). Comparison of several popular discrimination indices based on different criteria and their application in item analysis. University of Georgia.
Livingston, S. A., & Dorans, N. J. (2004). A graphical approach to item analysis (Research Report No. RR-04-10). Educational Testing Service. https://doi.org/10.1002/j.2333-8504.2004.tb01937.x.
Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Addison–Wesley Publishing Company.
Macdonald, P., & Paunonen, S. V. (2002). A Monte Carlo comparison of item and person statistics based on item response theory versus classical test theory. Educational and Psychological Measurement, 62(6), 921–943. https://doi.org/10.1177/0013164402238082.
McDonald, R. (1999). Test theory: A unified treatment. Lawrence Erlbaum Associates.
Metsämuuronen, J. (2016). Item–total correlation as the cause for the underestimation of the alpha estimate for the reliability of the scale. GJRA - Global Journal for Research Analysis, 5(1), 471–477.
Metsämuuronen, J. (2017a). Essentials of research methods in human sciences. Vol 1: Elementary basics. SAGE Publications.
Metsämuuronen, J. (2017b). Essentials of research methods in human sciences. Vol 3: Advanced analysis. SAGE Publications.
Metsämuuronen, J. (2020a). Somers’ D as an alternative for the item–test and item–rest correlation coefficients in the educational measurement settings. International Journal of Educational Methodology, 6(1), 207–221. https://doi.org/10.12973/ijem.6.1.207
Metsämuuronen, J. (2020b). Generalized discrimination index. International Journal of Educational Methodology, 6(2), 237-257. https://doi.org/10.12973/ijem.6.2.237
Moses, T. (2017). A review of developments and applications in item analysis. In R. Bennett & M. von Davier (Eds.), Advancing human assessment. The methodological, psychological and policy contributions of ETS (pp. 19–46). Springer Open. https://doi.org/10.1007/978-3-319-58689-2_2
Newson, R. (2002). Parameters behind “nonparametric” statistics: Kendall’s tau, Somers’ D and Median differences. The Stata Journal, 2(1), 45–64.
Newson, R. (2006). Confidence intervals for rank statistics: Somers’ D and extensions. The Stata Journal, 6(3), 309–334.
Newton, R. (2008). Identity of Somers’ D and the rank biserial correlation coefficient. Roger Newson http://www.rogernewsonresources.org.uk/miscdocs/ranksum1.pdf
Pearson, K. (1896). Mathematical contributions to the theory of evolution III. Regression, heredity, and panmixia. Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character, 187, 253–318. https://doi.org/10.1098/rsta.1896.0007
Pearson, K. (1900). I. Mathematical contributions to the theory of evolution. VII. On the correlation of characters not quantitatively measurable. Philosophical Transactions of the Royal Society A. Mathematical, Physical and Engineering Sciences, 195(262–273), 1–47. https://doi.org/10.1098/rsta.1900.0022
Pearson, K. (1903). I. Mathematical contributions to the theory of evolution. —XI. On the influence of natural selection on the variability and correlation of organs. Philosophical Transactions of the Royal Society A. Mathematical, Physical and Engineering Sciences, 200(321–330), 1–66. https://doi.org/10.1098/rsta.1903.0001
Pearson, K. (1905). On the general theory of skew correlation and non-linear regression. Dulau & Co.
Pearson, K. (1913). On the measurement of the influence of “broad categories” on correlation. Biometrika, 9(1–2), 116–139. https://doi.org/10.1093/biomet/9.1-2.116
Raykov, T. (2004). Estimation of maximal reliability: A note on a covariance structure modeling approach. British Journal of Mathematical and Statistical Psychology, 57(1), 21‒27. http://doi.org/10.1348/000711004849295
Raykov, T. (2005). Studying group and time invariance in maximal reliability for multiple-component measuring instruments via covariance structure modeling. British Journal of Mathematical and Statistical Psychology, 58(Pt 2), 301‒317. http://doi.org/10.1348/000711005X38591
Rulon P. J. (1939). A simplified procedure for determining the reliability of a test by split-halves. Harvard Educational Review, 9, 99–103.
Siegel, S., & Castellan, N. J., Jr. (1988). Nonparametric statistics for the behavioral sciences (2nd ed.). McGraw-Hill.
Somers, R. H. (1962). A new asymmetric measure of association for ordinal variables. American Sociological Review, 27(6), 799–811. https://doi.org/10.2307/2090408
Spearman, C. (1910). Correlation computed with faulty data. British Journal of Psychology, 3(3), 271–295. https://doi.org/10.1111/j.2044-8295.1910.tb00206.x
Van der Ark, L. A., & Van Aert, R. C. M. (2015). Comparing confidence intervals for Goodman and Kruskal's gamma coefficient. Journal of Statistical Computation and Simulation, 85(12), 2491–2505. https://doi.org/10.1080/00949655.2014.932791
Wendt, H. W. (1972). Dealing with a common problem in social science: A simplified rank biserial coefficient of correlation based on the U statistic. European Journal of Social Psychology, 2(4), 463–465. https://doi.org/10.1002/ejsp.2420020412
Wolf, R. (1967). Evaluation of several formulae for correction of item-total correlations in item analysis. Journal of Educational Measurement, 4(1), 21–26. https://doi.org/10.1111/j.1745-3984.1967.tb00565.x