logo logo International Journal of Educational Methodology

IJEM is a leading, peer-reviewed, open access, research journal that provides an online forum for studies in education, by and for scholars and practitioners, worldwide.

Subscribe to

Receive Email Alerts

for special events, calls for papers, and professional development opportunities.

Subscribe

Publisher (HQ)

RHAPSODE
Eurasian Society of Educational Research
College House, 2nd Floor 17 King Edwards Road, Ruislip, London, HA4 7AE, UK
RHAPSODE
Headquarters
College House, 2nd Floor 17 King Edwards Road, Ruislip, London, HA4 7AE, UK
Research Article

Dimension-Corrected Somers’ D for the Item Analysis Settings

Jari Metsämuuronen

A new index of item discrimination power (IDP), dimension-corrected Somers’ D (D2) is proposed. Somers’ D is one of the superior alternati.

A

A new index of item discrimination power (IDP), dimension-corrected Somers’ D (D2) is proposed. Somers’ D is one of the superior alternatives for item–total- (Rit) and item–rest correlation (Rir) in reflecting the real IDP with items with scales 0/1 and 0/1/2, that is, up to three categories. D also reaches the extreme value +1 and ‒1 correctly while Rit and Rir cannot reach the ultimate values in the real-life testing settings. However, when the item has four categories or more, Somers’ D underestimates IDP more than Pearson correlation. A simple correction to Somers’ D in the polytomous case seems to lead to be effective in item analysis settings.  In the simulation with real-life items, D2 showed very few cases of obvious underestimation and practically no cases of obvious overestimation. With certain restrictions discussed in the article, D2 seems to be a good alternative for these classic estimators not only with dichotomous items but also with the polytomous ones. In general, the magnitudes of the estimates by D2 are higher than those by Rit, Rir, and polychoric correlation and they seem to be close of those of bi- and polyserial correlation coefficients without out-of-range values.

Keywords: Item analysis, Pearson correlation, item–total correlation, item–rest correlation, Somers’ D, item discrimination power.

cloud_download PDF
Cite
Article Metrics
Views
511
Download
1284
Citations
Crossref
10

Scopus
0

References

Agresti, A. (2010). Analysis of ordinal categorical data (2nd ed). Wiley.

Aslan, S., & Aybek, B. (2020). Testing the effectiveness of interdisciplinary curriculum-based multicultural education on tolerance and critical thinking skill. International Journal of Educational Methodology, 6(1), 43–55. https://doi.org/10.12973/ijem.6.1.43.

Brown, W. (1910). Some experimental results in the correlation of mental abilities. British Journal of Psychology, 3(3), 296–322. https://doi.org/10.1111/j.2044-8295.1910.tb00207.x.   

Byrne, B. M. (2001). Structural equation modeling with AMOS. Basic concepts, applications, and programming. Lawrence Erlbaum Associates, Publishers.

Cox, N. R. (1974). Estimation of the correlation between a continuous and a discrete variable. Biometrics, 30(1), 171–178. https://doi.org/10.2307/2529626.

Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16(3) Sept. 297–334. https://doi.org/10.1007/BF02310555.

Cureton, E. E. (1956). Rank–biserial correlation. Psychometrika, 21(3), 287–290. https://doi.org/10.1007%2FBF02289138.

Cureton E. E. (1966). Corrected item–test correlations. Psychometrika, 31(1), 93–96. https://doi.org/10.1007/BF02289461. 

Delil, A., & Ozcan, B.N.(2019). How 8th graders are assessed through tests by mathematics teachers? International Journal of Educational Methodology, 5(3), 479–488. https://doi.org/10.12973/ijem.5.3.479.

Drasgow, F. (1986). Polychoric and polyserial correlations. In S. Kotz & N. L. Johnson (Eds.), Encyclopedia of statistical sciences- Vol 7 (pp. 68–74). John Wiley.

Educational Testing Service (2020). Glossary of standardized testing terms. Educational Testing Service. https://www.ets.org/understanding_testing/glossary/

Finnish Education Evaluation Centre (2018). National assessment of learning outcomes in mathematics at grade 9 in 2002 (Unpublished dataset opened for the re-analysis 18.2.2018). Finnish National Education Evaluation Centre.

Flanagan J. C. (1937). A proposed procedure for increasing the efficiency of objective tests. Journal of Educational Psychology, 28(1), 17–21. https://doi.org/10.1037/h0057430.

Goktas, A., & Isci. O. A. (2011). Comparison of the most commonly used measures of association for doubly ordered square contingency tables via simulation. Methodological Notebooks/ Metodoloski Zvezki, 8(1), 17–37.

Goodman, L. A., & Kruskal, W. H. (1954). Measures of association for cross classifications. Journal of the American Statistical Association, 49(268), 732–764. https://doi.org/10.1080/01621459.1954.10501231. 

Greiner, R. (1909). Uber das fehlersystem der kollektivmaßlehre [Of  the error systemic of collectives]. Journal of Mathematics and Physics/ Zeitschift fur Mathematik und Physik, 57, 121–158, 225–260, 337–373.

Gulliksen, H. (1950). Theory of mental tests. Lawrence Erlbaum Associates, Publishers.

Guttman, L. (1945). A basis for analyzing test-retest reliability. Psychometrika, 10(4), 255‒282. https://doi.org/10.1007/BF02288892.

Guttman, L. (1950). The basis for scalogram analysis. In S. A. Stouffer, L. Guttman, E. A. Suchman, P. F. Lazarsfield, S. A. Star, & J. A. Clausen (Eds.), Measurement and prediction. Princeton University Press.

Henrysson, S. (1963). Correction of item–total correlations in item analysis. Psychometrika, 28(2), 211–218. https://doi.org/10.1007/BF02289618. 

Howard K. I, & Forehand, G. A. (1962). A method for correcting item-total correlations for the effect of relevant item inclusion. Educational and Psychological Measurement, 22(4), 731–735. https://doi.org/10.1177/001316446202200407.

IBM (2011). IBM SPSS Decision trees 20. ftp://public.dhe.ibm.com/ software/analytics/spss/documentation/statistics/20.0/en/client/Manuals/IBM_SPSS_Decision_Trees.pdf

Kass, G. (1980). An exploratory technique for investigating large quantities of categorical data. Applied Statistics, 29(2), 119–127. https://doi.org/10.2307/2986296. 

Kendall, M. (1938). A new measure of rank correlation. Biometrika, 30(1/2), 81–93. https://doi.org/10.2307/2332226.

Kendall, M. (1949). Rank and product–moment correlation. Biometrika, 36(1/2), 177-193. https://doi.org/10.2307/2332540.

Kuder, G. F., & Richardson, M. W. (1937). The theory of the estimation of test reliability. Psychometrika, 2(3), 151–160. https://doi.org/10.1007/BF02288391.

Li, H. (1997). A unifying expression for the maximal reliability of a linear composite. Psychometrika, 62(2), 245–249. https://doi.org/10.1007/BF02295278.

Liu, F. (2008). Comparison of several popular discrimination indices based on different criteria and their application in item analysis. University of Georgia.

Livingston, S.  A., & Dorans, N. J. (2004). A graphical approach to item analysis (Research Report No. RR-04-10). Educational Testing Service.  https://doi.org/10.1002/j.2333-8504.2004.tb01937.x.

Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Addison–Wesley Publishing Company.

Macdonald, P., & Paunonen, S. V. (2002). A Monte Carlo comparison of item and person statistics based on item response theory versus classical test theory. Educational and Psychological Measurement, 62(6), 921–943. https://doi.org/10.1177/0013164402238082.

McDonald, R. (1999). Test theory: A unified treatment. Lawrence Erlbaum Associates.

Metsämuuronen, J. (2016). Item–total correlation as the cause for the underestimation of the alpha estimate for the reliability of the scale. GJRA - Global Journal for Research Analysis, 5(1), 471–477. 

Metsämuuronen, J. (2017a). Essentials of research methods in human sciences. Vol 1: Elementary basics. SAGE Publications.

Metsämuuronen, J. (2017b). Essentials of research methods in human sciences. Vol 3: Advanced analysis. SAGE Publications.

Metsämuuronen, J. (2020a). Somers’ D as an alternative for the item–test and item–rest correlation coefficients in the educational measurement settings. International Journal of Educational Methodology, 6(1), 207–221. https://doi.org/10.12973/ijem.6.1.207

Metsämuuronen, J. (2020b). Generalized discrimination index. International Journal of Educational Methodology, 6(2), 237-257. https://doi.org/10.12973/ijem.6.2.237

Moses, T. (2017). A review of developments and applications in item analysis. In R. Bennett & M. von Davier (Eds.), Advancing human assessment. The methodological, psychological and policy contributions of ETS (pp. 19–46). Springer Open. https://doi.org/10.1007/978-3-319-58689-2_2

Newson, R. (2002). Parameters behind “nonparametric” statistics: Kendall’s tau, Somers’ D and Median differences. The Stata Journal, 2(1), 45–64.

Newson, R. (2006). Confidence intervals for rank statistics: Somers’ D and extensions. The Stata Journal, 6(3), 309–334.

Newton, R. (2008). Identity of Somers’ D and the rank biserial correlation coefficient. Roger Newson http://www.rogernewsonresources.org.uk/miscdocs/ranksum1.pdf

Pearson, K. (1896). Mathematical contributions to the theory of evolution III. Regression, heredity, and panmixia. Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character, 187, 253–318. https://doi.org/10.1098/rsta.1896.0007

Pearson, K. (1900). I. Mathematical contributions to the theory of evolution. VII. On the correlation of characters not quantitatively measurable. Philosophical Transactions of the Royal Society A. Mathematical, Physical and Engineering Sciences, 195(262–273), 1–47. https://doi.org/10.1098/rsta.1900.0022

Pearson, K. (1903). I. Mathematical contributions to the theory of evolution. —XI. On the influence of natural selection on the variability and correlation of organs. Philosophical Transactions of the Royal Society A. Mathematical, Physical and Engineering Sciences, 200(321–330), 1–66. https://doi.org/10.1098/rsta.1903.0001

Pearson, K. (1905). On the general theory of skew correlation and non-linear regression. Dulau & Co.

Pearson, K. (1913). On the measurement of the influence of “broad categories” on correlation. Biometrika, 9(1–2), 116–139. https://doi.org/10.1093/biomet/9.1-2.116

Raykov, T. (2004). Estimation of maximal reliability: A note on a covariance structure modeling approach. British Journal of Mathematical and Statistical Psychology, 57(1), 21‒27. http://doi.org/10.1348/000711004849295

Raykov, T. (2005). Studying group and time invariance in maximal reliability for multiple-component measuring instruments via covariance structure modeling. British Journal of Mathematical and Statistical Psychology, 58(Pt 2), 301‒317. http://doi.org/10.1348/000711005X38591

Rulon P. J. (1939). A simplified procedure for determining the reliability of a test by split-halves. Harvard Educational Review, 9, 99–103.

Siegel, S., & Castellan, N. J., Jr. (1988). Nonparametric statistics for the behavioral sciences (2nd ed.). McGraw-Hill.

Somers, R. H. (1962). A new asymmetric measure of association for ordinal variables. American Sociological Review, 27(6), 799–811. https://doi.org/10.2307/2090408

Spearman, C. (1910). Correlation computed with faulty data. British Journal of Psychology, 3(3), 271–295. https://doi.org/10.1111/j.2044-8295.1910.tb00206.x

Van der Ark, L. A., & Van Aert, R. C. M. (2015). Comparing confidence intervals for Goodman and Kruskal's gamma coefficient. Journal of Statistical Computation and Simulation, 85(12), 2491–2505. https://doi.org/10.1080/00949655.2014.932791

Wendt, H. W. (1972). Dealing with a common problem in social science: A simplified rank biserial coefficient of correlation based on the U statistic. European Journal of Social Psychology, 2(4), 463–465. https://doi.org/10.1002/ejsp.2420020412

Wolf, R. (1967). Evaluation of several formulae for correction of item-total correlations in item analysis. Journal of Educational Measurement, 4(1), 21–26. https://doi.org/10.1111/j.1745-3984.1967.tb00565.x

...