Number of Response Options, Reliability, Validity, and Potential Bias in the Use of the Likert Scale Education and Social Science Research: A Literature Review
This study reviews 60 papers using a Likert scale and published between 2012 – 2021. Screening for literature review uses the PRISMA method. The.
- Pub. date: November 15, 2022
- Pages: 625-637
- 1830 Downloads
- 5368 Views
- 26 Citations
This study reviews 60 papers using a Likert scale and published between 2012 – 2021. Screening for literature review uses the PRISMA method. The data analysis technique was carried out through data extraction, then synthesized in a structured manner using the narrative method. To achieve credible research results at the stage of the data collection and data analysis process, a group discussion forum (FGD) was conducted. The findings show that only 10% of studies use a measurement scale with an even answer choice category (4, 6, 8, or 10 choices). In general, (90%) of research uses a measurement instrument that involves a Likert scale with odd response choices (5, 7, 9, or 11) and the most popular researchers use a Likert scale with a total response of 5 points. The use of a rating scale with an odd number of responses of more than five points (especially on a seven-point scale) is the most effective in terms of reliability and validity coefficients, but if the researcher wants to direct respondents to one side, then a scale with an even number of responses (six points) is possible. more suitable. The presence of response bias and central tendency bias can affect the validity and reliability of the use of the Likert scale instrument.
Keywords: Likert scale, literature review, potential bias, reliability and validity.
References
Acosta, S., Garza, T., Hsu, H. Y., & Goodson, P. (2020). Assessing quality in systematic literature reviews: A study of novice rater training. SAGE Open, 10(3), 1–11. https://doi.org/10.1177/2158244020939530
Ahn, E., & Kang, H. (2018). Introduction to systematic review and meta-analysis. Korean Journal of Anesthesiology, 71(2), 103–112. https://doi.org/10.4097/kjae.2018.71.2.103
Aini, Q., Zuliana, S. R., & Santoso, N. P. L. (2018). Management measurement scale as a reference to determine interval in a variable. Aptisi Transactions on Management, 2(1), 45–54. https://doi.org/10.33050/atm.v2i1.775
Alrajeh, T. S., & Shindel, B. W. (2020). Student engagement and math teachers support. Journal on Mathematics Education, 11(2), 167–180. https://doi.org/10.22342/jme.11.2.10282.167-180
Baka, A., Figgou, L., & Triga, V. (2012). “Neither agree, nor disagree”: A critical analysis of the middle answer category in Voting Advice Applications. International Journal of Electronic Governance, 5(3–4), 244–263. https://doi.org/10.1504/IJEG.2012.051306
Benek, I., & Akcay, B. (2019). Development of STEM attitude scale for secondary school students: Validity and reliability study. International Journal of Education in Mathematics, Science and Technology, 7(1), 32–52. https://doi.org/10.18404/ijemst.509258
Bidermana, M. D., & Reddockb, C. M. (2012). The relationship of scale reliability and validity to partisipant inconsistency. Personality and Individual Differences, 52(5), 647–651. https://doi.org/10.1016/j.paid.2011.12.012
Bishop, P. A., & Herron, R. L. (2015). Use and misuse of the Likert item responses and other ordinal measures. International Journal of Exercise Science, 8(3), 297–302. https://bit.ly/3ARo13E
Bolarinwa, O. (2015). Principles and methods of validity and reliability testing of questionnaires used in social and health science researches. Nigerian Postgraduate Medical Journal, 22(4), 195-201. https://doi.org/10.4103/1117-1936.173959
Boone, H. N., & Boone, D. A. (2012). Analyzing Likert data. Journal of Extension, 50(2), Article 2TOT2. https://bit.ly/3RkN2eO
Carey, E., Hill, F., Devine, A., & Szucs, D. (2017). The modified abbreviated math anxiety scale: A valid and reliable instrument for use with children. Frontiers in Psychology, 8(1), 1–13. https://doi.org/10.3389/fpsyg.2017.00011
Çetin, F., Demirkan, Ö., & Çetin, Ş. (2020). A validity and reliability study of the scale for attitude towards classroom as a learning environment. Educational Policy Analysis and Strategic Research, 15(3), 233–248. https://doi.org/10.29329/epasr.2020.270.11
Chen, L.-T., & Liu, L. (2020). Methods to analyze Likert-type data in educational technology research. Journal of Educational Technology Development and Exchange, 13(2), 39–60. https://doi.org/10.18785/jetde.1302.04
Cheng, Y. S. (2012). A measure of second language writing anxiety: Scale development and preliminary validation. Journal of Second Language Writing, 13(4), 313–335. https://doi.org/10.1016/j.jslw.2004.07.001
Çıplak, E., & Çam, S. (2019). The development of the selfie attitude scale: A validity and reliability study. European Journal of Education Studies, 6(8), 240–254. https://doi.org/10.5281/zenodo.3555247
Dawes, J. (2018). Do data characteristics change according to the number of scale points used? An experiment using 5-point, 7-point and 10-point scales. International Journal of Market Research, 50(1), 61–77. https://doi.org/ggktxk
DeCastellarnau, A. (2018). A classification of response scale characteristics that affect data quality: A literature review. Quality and Quantity, 52(4), 1523–1559. https://doi.org/gdqv89
Dilekli, Y., & Tezci, E. (2019). Adaptation of teachers’ teaching thinking practices scale into English. European Journal of Educational Research, 8(4), 943–953. https://doi.org/10.12973/eu-jer.8.4.943
Dogan, E. (2018). An application of the partial credit IRT model in identifying benchmarks for polytomous rating scale instruments. Practical Assessment, Research and Evaluation, 23, Article 7. https://doi.org/10.7275/1cf3-aq56
Ferrando, P. J., Lorenzo-Seva, U., & Chico, E. (2009). A general factor-analytic procedure for assessing response bias in questionnaire measures. Structural Equation Modeling: A Multidisciplinary Journal, 16(2), 364–381. https://doi.org/ckwwnt
Guerra, A. L., Gidel, T., & Vezzetti, E. (2016). Toward a common procedure using Likert and L ikert-type scales in small groups comparative design observations. In M. Dorian, S. Mario, P. Neven, B. Nenad & S. Stanko (Eds.), Proceedings of the DESIGN 2016 14th International Design Conference (Vol. 84, pp. 23–32). Faculty of Mechanical Engineering and Naval Architecture, University of Zagreb. https://bit.ly/3Cqvf10
Hartley, J. (2013). Some thoughts on Likert-type scales. International Journal of Clinical and Health Psychology, 13, 83–86. https://doi.org/10.1016/S1697-2600(14)70040-7
James, R. L. (2019). Measuring user experience with 3, 5, 7, or 11 points: Does it matter? Human Factors: The Journal of the Human Factors and Ergonomics Society, 63(6), 999–1011. https://doi.org/10.1177/0018720819881312
Jamieson, S. (2004). Likert scales: How to (ab)use them. Medical Education, 38(12), 1217–1218. https://doi.org/b5gxwx
Jeong, H. J., Liao, H. H., Han, S. H., & Lee, W. C. (2020). An application of item response theory to scoring patient safety culture survey data. International Journal of Environmental Research and Public Health, 17(3), 10–14. https://doi.org/10.3390/ijerph17030854
Jeong, J. S., González-gómez, D., & Cañada-cañada, F. (2019). Effects of active learning methodologies on the students’ emotions, self-efficacy beliefs and learning outcomes in a science distance learning course. Journal of Technology and Science Education, 9(2), 217–227. https://doi.org/10.3926/jotse.530
Jonnalagadda, S. R., Goyal, P., & Huffman, M. D. (2015). Automating data extraction in systematic reviews: A systematic review. Systematic Reviews, 4, Article 78. https://doi.org/10.1186/s13643-015-0066-7
Joshi, A., Kale, S., Chandel, S., & Pal, D. (2015). Likert scale: Explored and explained. British Journal of Applied Science & Technology, 7(4), 396–403. https://doi.org/10.9734/bjast/2015/14975
Józsa, K., & Morgan, G. A. (2017). Reversed items in Likert scales: Filtering out invalid responders. Journal of Psychological and Educational Research, 25(1), 7–25. https://bit.ly/3TLbAze
Khalaf, B. K., & Zin, Z. B. M. (2018). Traditional and inquiry-based learning pedagogy: A systematic critical review. International Journal of Instruction, 11(4), 545–564. https://doi.org/10.12973/iji.2018.11434a
Kokolakis, S. (2017). Privacy attitudes and privacy behaviour: A review of current research on the privacy paradox phenomenon. Computers and Security, 64, 122–134. https://doi.org/10.1016/j.cose.2015.07.002
Korkmaz, O., & Altun, H. (2014). A validity and reliability study of the attitude scale of computer programming learning (ASCOPL). Mevlana International Journal of Education, 4(1), 30–43.
Korkut Al Tuna, O., & Arslan, F. M. (2016). Ölçek madde sayisinin cevaplayicilarin değerlendirmeleri ve veri karakteristiği üzerindeki etkileri: 5’li ve 7 ‘li likert tipi ölçekler arasindaki farkliliklarin deneysel tasarim kullanarak incelenmesi [Impact of the number of scale points on data characteristics and respondents’ evaluations: An experimental design approach using 5-point and 7-point Likert-type scales]. İstanbul Üniversitesi Siyasal Bilgiler Fakültesi Dergisi, (55), 1–20. https://doi.org/10.17124/iusiyasal.320009
Kreitchmann, R. S., Abad, F. J., Ponsoda, V., Nieto, M. D., & Morillo, D. (2019). Controlling for response biases in self-report scales: Forced-choice vs. psychometric modeling of Likert items. Frontiers in Psychology, 10, Article 2309. https://doi.org/10.3389/fpsyg.2019.02309
Krosnick, J. A., & Holbrook, A. (2012). The impact of “no opinion” response options on data quality non-attitude reduction or an invitation to satisfice? Public Opinion Quarterly, 66(3), 371–403. https://doi.org/10.1086/341394
Kyriazos, T. A., & Stalikas, A. (2018). Applied psychometrics: The steps of scale development and standardization process. Psychology, 9(11), 2531–2560. https://doi.org/10.4236/psych.2018.911145
Lewis, J., & Erdinç, O. (2017). User experience rating scales with 7, 11, or 101 points: Does it matter? Journal of Usability Studies, 12(2), 73–91. https://bit.ly/3bTItIX
Likert, R. (1932). A technique for the measurement of attitudes. In R. S. Woodworth (Ed.), Archives of Psychology (Vol. 22, pp. 5–55). SAGE. https://bit.ly/3QngpLX
Lionello, M., Aletta, F., Mitchell, A., & Kang, J. (2021). Introducing a method for intervals correction on multiple Likert scales: A case study on an urban soundscape data collection instrument. Frontiers in Psychology, 11, Article 602831. https://doi.org/10.3389/fpsyg.2020.602831
Lozano, L. M., García-Cueto, E., & Muñiz, J. (2008). Effect of the number of response categories on the reliability and validity of rating scales. Methodology, 4(2), 73–79. https://doi.org/10.1027/1614-2241.4.2.73
Malone, H., Nicholl, H., & Tracey, C. (2014). Awareness and minimisation of systematic bias in research. British Journal of Nursing, 23(5), 279–282. https://doi.org/10.12968/bjon.2014.23.5.279
Martín, J. C., Román, C., & Gonzaga, C. (2018). How different n-point Likert scales affect the measurement of satisfaction in academic conferences. International Journal for Quality Research, 12(2), 421–440. https://doi.org/10.18421/IJQR12.02-08
Martins, L. E. G., & Gorschek, T. (2016). Requirements engineering for safety-critical systems: A systematic literature review. Information and Software Technology, 75, 71–89. https://doi.org/10.1016/j.infsof.2016.04.002
Mathes, T., Klaßen, P., & Pieper, D. (2017). Frequency of data extraction errors and methods to increase data extraction quality: A methodological review. BMC Medical Research Methodology, 17, Article 152. https://doi.org/10.1186/s12874-017-0431-4
Miles, M. B., Huberman, A. M., & Saldaña, J. (2014). Qualitative data analysis (3rd ed.). SAGE.
Mircioiu, C., & Atkinson, J. (2017). A comparison of parametric and non-parametric Methods applied to a Likert scale. Pharmacy, 5(4), 26–34. https://doi.org/10.3390/pharmacy5020026
Mishra, P., Pandey, C. M., Singh, U., & Gupta, A. (2018). Scales of measurement and presentation of statistical data. Annals of Cardiac Anaesthesia, 21(4), 419–422. https://doi.org/10.4103/aca.ACA_131_18
Mondiana, Y. Q., Pramoedyo, H., & Sumarminingsih, E. (2018). Structural equation modeling on Likert scale data with transformation by successive interval method and with no transformation. International Journal of Scientific and Research Publications, 8(5), 398–405. https://doi.org/10.29322/ijsrp.8.5.2018.p7751
Moors, G., Kieruj, N. D., & Vermunt, J. K. (2014). The effect of labeling and numbering of response scales on the likelihood of response bias. Sociological Methodology, 44(1), 369–399. https://doi.org/gg8hfw
Munn, Z., Tufanaru, C., & Aromataris, E. (2014). Data extraction and synthesis. American Journal of Nursing, 114(7), 49–54. https://doi.org/gqbxrm
Nadler, J. T., Weston, R., & Voyles, E. C. (2015). Stuck in the middle: The use and interpretation of mid-points in items on questionnaires. Journal of General Psychology, 142(2), 71–89. https://doi.org/gctm2x
Nemoto, T., & Beglar, D. (2014). Developing Likert-scale questionnaires. In N. Sonda & A. Krause (Eds.), JALT2013 Conference Proceedings (pp. 1–8). JALT. https://bit.ly/3AZZqKf
Onwuegbuzie, A. J., Leech, N. L., & Collins, K. M. T. (2012). Qualitative analysis techniques for the review of the literature. Qualitative Report, 17(28), 1–28. https://doi.org/gmtqn4
Pedder, H., Sarri, G., Keeney, E., Nunes, V., & Dias, S. (2016). Data extraction for complex meta-analysis (DECiMAL) guide. Systematic Reviews, 5, Article 212. https://doi.org/10.1186/s13643-016-0368-4
Pimentel, J. L. (2019). Some biases in Likert scaling usage and its correction. International Journal of Sciences: Basic and Applied Research, 45(1), 183–191. https://bit.ly/3PwBseJ
Popenoe, R., Langius-Eklöf, A., Stenwall, E., & Jervaeus, A. (2021). A practical guide to data analysis in general literature reviews. Nordic Journal of Nursing Research, 41(4), 175–186. https://doi.org/jbfb
Preston, C. C., & Colman, A. M. (2000). Optimal number of response categories in rating scales: Reliability, validity, discriminating power, and respondent preferences. Acta Psychologica, 104(1), 1–15. https://doi.org/dbcr2g
Sangwan, A., Sangwan, A., & Punia, P. (2021). Development and validation of an attitude scale towards online teaching and learning for higher education teachers. TechTrends, 65(2), 187–195. https://doi.org/gjgmqn
Schmidt, L., Olorisade, B. K., McGuinness, L. A., Thomas, J., & Higgins, J. P. T. (2021). Data extraction methods for systematic review (semi) automation: A living systematic review. F1000 Research, 10, Article 401. https://doi.org/jbfc
Selcuk, A. A. (2019). A guide for systematic reviews: PRISMA. Turkish Archives of Otorhinolaryngology, 57(1), 57–58. https://doi.org/10.5152/tao.2019.4058
Simms, L. J., Zelazny, K., Williams, T. F., & Bernstein, L. (2019). Does the number of response options matter? Psychometric perspectives using personality questionnaire data. Psychological Assessment, 31(4), 557–566. https://doi.org/10.1037/pas0000648
Sirganci, G., & Uyumaz, G. (2021). Determining the factors affecting the psychological distance between categories in the rating scale. International Journal of Contemporary Educational Research, 8(3), 178–190. https://doi.org/10.33200/ijcer.858599
Solimun, Fernandes, A. A. R., & Arisoesilaningsih, E. (2017). The efficiency of parameter estimation of latent path analysis using summated rating scale (SRS) and method of successive interval (MSI) for transformation of score to scale. AIP Conference Proceedings, 1913, Article 020037. https://doi.org/10.1063/1.5016671
Subedi, B. P. (2016). Using Likert type data in social science research: Confusion, issues and challenges. International Journal of Contemporary Applied Sciences, 3(2), 36–49. https://bit.ly/3q8AVWh
Sullivan, G. M., & Artino, A. R. (2013). Analyzing and interpreting data from Likert-type scales. Journal of Graduate Medical Education, 5(4), 541–542. https://doi.org/10.4300/jgme-5-4-18
Taherdoost, H. (2016). Validity and reliability of the research instrument: How to test the validation of a questionnaire/survey in a research. International Journal of Academic Research in Management, 5(3), 28–36. https://doi.org/10.2139/ssrn.3205040
Taherdoost, H. (2019). What is the best response scale for survey and questionnaire design: Review of different lengths of rating scale / attitude scale / Likert scale. International Journal of Academic Research in Management, 8(1), 1–10. https://bit.ly/3Be4KL7
Thomas, J., & Harden, A. (2008). Methods for the thematic synthesis of qualitative research in systematic reviews. BMC Medical Research Methodology, 8, Article 45. https://doi.org/10.1186/1471-2288-8-45
Thorpe, G. L., & Favia, A. (2016). Data analysis using item response theory methodology: An introduction to selected programs and applications. Psychology Faculty Scholarship, 20, 1-33. https://bit.ly/3RcMg39
Tijmstra, J., Bolsinova, M., & Jeon, M. (2018). General mixture item response models with different item response structures: Exposition with an application to Likert scales. Behavior Research Methods, 50(6), 2325–2344. https://doi.org/10.3758/s13428-017-0997-0
Ulia, N., & Kusmaryono, I. (2021). Mathematical disposition of students’, teachers, and parents in distance learning: A survey. Premiere Educandum : Jurnal Pendidikan Dasar Dan Pembelajaran, 11(1), 147–159. https://doi.org/10.25273/pe.v11i1.8869
Warmbrod, J. R. (2014). Reporting and interpreting scores derived from Likert-type scales. Journal of Agricultural Education, 55(5), 30–47. https://doi.org/10.5032/jae.2014.05030
Xiong, C., Ceja, C. R., Ludwig, C. J. H., & Franconeri, S. (2020). Biased average position estimates in line and bar graphs: Underestimation, overestimation, and perceptual pull. IEEE Transactions on Visualization and Computer Graphics, 26(1), 301–310. https://doi.org/10.1109/TVCG.2019.2934400
Zanon, C., Hutz, C. S., Yoo, H., & Hambleton, R. K. (2016). An application of item response theory to psychological test development. Psicologia: Reflexao e Critica, 29(18), 1–10. https://doi.org/10.1186/s41155-016-0040-x
Zhang, Y., Xu, Q., Lao, J., & Shen, Y. (2021). Reliability and validity of a chinese version of the stem attitude scale for primary and secondary school students. Sustainability, 13(22), Article 12661. https://doi.org/10.3390/su132212661
Zumsteg, J. M., Cooper, J. S., & Noon, M. S. (2012). Systematic review checklist: A standardized technique for assessing and reporting reviews of life cycle assessment data. Journal of Industrial Ecology, 16(1), 12–21. https://doi.org/10.1111/j.1530-9290.2012.00476.x