logo logo International Journal of Educational Methodology

IJEM is a leading, peer-reviewed, open access, research journal that provides an online forum for studies in education, by and for scholars and practitioners, worldwide.

Subscribe to

Receive Email Alerts

for special events, calls for papers, and professional development opportunities.

Subscribe

Publisher (HQ)

RHAPSODE LTD
Eurasian Society of Educational Research
College House, 2nd Floor 17 King Edwards Road, Ruislip, London, UK. HA4 7AE
RHAPSODE LTD
Headquarters
College House, 2nd Floor 17 King Edwards Road, Ruislip, London, UK. HA4 7AE
likert scale literature review potential bias reliability and validity

Number of Response Options, Reliability, Validity, and Potential Bias in the Use of the Likert Scale Education and Social Science Research: A Literature Review

Imam Kusmaryono , Dyana Wijayanti , Hevy Risqi Maharani

This study reviews 60 papers using a Likert scale and published between 2012 – 2021. Screening for literature review uses the PRISMA method. The.

T

This study reviews 60 papers using a Likert scale and published between 2012 – 2021. Screening for literature review uses the PRISMA method. The data analysis technique was carried out through data extraction, then synthesized in a structured manner using the narrative method. To achieve credible research results at the stage of the data collection and data analysis process, a group discussion forum (FGD) was conducted. The findings show that only 10% of studies use a measurement scale with an even answer choice category (4, 6, 8, or 10 choices). In general, (90%) of research uses a measurement instrument that involves a Likert scale with odd response choices (5, 7, 9, or 11) and the most popular researchers use a Likert scale with a total response of 5 points. The use of a rating scale with an odd number of responses of more than five points (especially on a seven-point scale) is the most effective in terms of reliability and validity coefficients, but if the researcher wants to direct respondents to one side, then a scale with an even number of responses (six points) is possible. more suitable. The presence of response bias and central tendency bias can affect the validity and reliability of the use of the Likert scale instrument.

Keywords: Likert scale, literature review, potential bias, reliability and validity.

cloud_download PDF
Cite
Article Metrics
Views
1469
Download
2779
Citations
Crossref
14

Scopus
4

References

Acosta, S., Garza, T., Hsu, H. Y., & Goodson, P. (2020). Assessing quality in systematic literature reviews: A study of novice rater training. SAGE Open, 10(3), 1–11. https://doi.org/10.1177/2158244020939530

Ahn, E., & Kang, H. (2018). Introduction to systematic review and meta-analysis. Korean Journal of Anesthesiology, 71(2), 103–112. https://doi.org/10.4097/kjae.2018.71.2.103

Aini, Q., Zuliana, S. R., & Santoso, N. P. L. (2018). Management measurement scale as a reference to determine interval in a variable. Aptisi Transactions on Management, 2(1), 45–54. https://doi.org/10.33050/atm.v2i1.775

Alrajeh, T. S., & Shindel, B. W. (2020). Student engagement and math teachers support. Journal on Mathematics Education, 11(2), 167–180. https://doi.org/10.22342/jme.11.2.10282.167-180

Baka, A., Figgou, L., & Triga, V. (2012). “Neither agree, nor disagree”: A critical analysis of the middle answer category in Voting Advice Applications. International Journal of Electronic Governance, 5(3–4), 244–263. https://doi.org/10.1504/IJEG.2012.051306

Benek, I., & Akcay, B. (2019). Development of STEM attitude scale for secondary school students: Validity and reliability study. International Journal of Education in Mathematics, Science and Technology, 7(1), 32–52. https://doi.org/10.18404/ijemst.509258

Bidermana, M. D., & Reddockb, C. M. (2012). The relationship of scale reliability and validity to partisipant inconsistency. Personality and Individual Differences, 52(5), 647–651. https://doi.org/10.1016/j.paid.2011.12.012

Bishop, P. A., & Herron, R. L. (2015). Use and misuse of the Likert item responses and other ordinal measures. International Journal of Exercise Science, 8(3), 297–302. https://bit.ly/3ARo13E

Bolarinwa, O. (2015). Principles and methods of validity and reliability testing of questionnaires used in social and health science researches. Nigerian Postgraduate Medical Journal, 22(4), 195-201. https://doi.org/10.4103/1117-1936.173959

Boone, H. N., & Boone, D. A. (2012). Analyzing Likert data. Journal of Extension, 50(2), Article 2TOT2. https://bit.ly/3RkN2eO

Carey, E., Hill, F., Devine, A., & Szucs, D. (2017). The modified abbreviated math anxiety scale: A valid and reliable instrument for use with children. Frontiers in Psychology, 8(1), 1–13. https://doi.org/10.3389/fpsyg.2017.00011

Çetin, F., Demirkan, Ö., & Çetin, Ş. (2020). A validity and reliability study of the scale for attitude towards classroom as a learning environment. Educational Policy Analysis and Strategic Research, 15(3), 233–248. https://doi.org/10.29329/epasr.2020.270.11

Chen, L.-T., & Liu, L. (2020). Methods to analyze Likert-type data in educational technology research. Journal of Educational Technology Development and Exchange, 13(2), 39–60. https://doi.org/10.18785/jetde.1302.04

Cheng, Y. S. (2012). A measure of second language writing anxiety: Scale development and preliminary validation. Journal of Second Language Writing, 13(4), 313–335. https://doi.org/10.1016/j.jslw.2004.07.001

Çıplak, E., & Çam, S. (2019). The development of the selfie attitude scale: A validity and reliability study. European Journal of Education Studies, 6(8), 240–254. https://doi.org/10.5281/zenodo.3555247

Dawes, J. (2018). Do data characteristics change according to the number of scale points used? An experiment using 5-point, 7-point and 10-point scales. International Journal of Market Research, 50(1), 61–77. https://doi.org/ggktxk

DeCastellarnau, A. (2018). A classification of response scale characteristics that affect data quality: A literature review. Quality and Quantity, 52(4), 1523–1559. https://doi.org/gdqv89

Dilekli, Y., & Tezci, E. (2019). Adaptation of teachers’ teaching thinking practices scale into English. European Journal of Educational Research, 8(4), 943–953. https://doi.org/10.12973/eu-jer.8.4.943

Dogan, E. (2018). An application of the partial credit IRT model in identifying benchmarks for polytomous rating scale instruments. Practical Assessment, Research and Evaluation, 23, Article 7. https://doi.org/10.7275/1cf3-aq56

Ferrando, P. J., Lorenzo-Seva, U., & Chico, E. (2009). A general factor-analytic procedure for assessing response bias in questionnaire measures. Structural Equation Modeling: A Multidisciplinary Journal, 16(2), 364–381. https://doi.org/ckwwnt

Guerra, A. L., Gidel, T., & Vezzetti, E. (2016). Toward a common procedure using Likert and L ikert-type scales in small groups comparative design observations. In M. Dorian, S. Mario, P. Neven, B. Nenad & S. Stanko (Eds.), Proceedings of the DESIGN 2016 14th International Design Conference (Vol. 84, pp. 23–32). Faculty of Mechanical Engineering and Naval Architecture, University of Zagreb. https://bit.ly/3Cqvf10

Hartley, J. (2013). Some thoughts on Likert-type scales. International Journal of Clinical and Health Psychology, 13, 83–86. https://doi.org/10.1016/S1697-2600(14)70040-7

James, R. L. (2019). Measuring user experience with 3, 5, 7, or 11 points: Does it matter? Human Factors: The Journal of the Human Factors and Ergonomics Society, 63(6), 999–1011. https://doi.org/10.1177/0018720819881312

Jamieson, S. (2004). Likert scales: How to (ab)use them. Medical Education, 38(12), 1217–1218. https://doi.org/b5gxwx

Jeong, H. J., Liao, H. H., Han, S. H., & Lee, W. C. (2020). An application of item response theory to scoring patient safety culture survey data. International Journal of Environmental Research and Public Health, 17(3), 10–14. https://doi.org/10.3390/ijerph17030854

Jeong, J. S., González-gómez, D., & Cañada-cañada, F. (2019). Effects of active learning methodologies on the students’ emotions, self-efficacy beliefs and learning outcomes in a science distance learning course. Journal of Technology and Science Education, 9(2), 217–227. https://doi.org/10.3926/jotse.530

Jonnalagadda, S. R., Goyal, P., & Huffman, M. D. (2015). Automating data extraction in systematic reviews: A systematic review. Systematic Reviews, 4, Article 78. https://doi.org/10.1186/s13643-015-0066-7

Joshi, A., Kale, S., Chandel, S., & Pal, D. (2015). Likert scale: Explored and explained. British Journal of Applied Science & Technology, 7(4), 396–403. https://doi.org/10.9734/bjast/2015/14975

Józsa, K., & Morgan, G. A. (2017). Reversed items in Likert scales: Filtering out invalid responders. Journal of Psychological and Educational Research, 25(1), 7–25. https://bit.ly/3TLbAze

Khalaf, B. K., & Zin, Z. B. M. (2018). Traditional and inquiry-based learning pedagogy: A systematic critical review. International Journal of Instruction, 11(4), 545–564. https://doi.org/10.12973/iji.2018.11434a

Kokolakis, S. (2017). Privacy attitudes and privacy behaviour: A review of current research on the privacy paradox phenomenon. Computers and Security, 64, 122–134. https://doi.org/10.1016/j.cose.2015.07.002

Korkmaz, O., & Altun, H. (2014). A validity and reliability study of the attitude scale of computer programming learning (ASCOPL). Mevlana International Journal of Education, 4(1), 30–43.

Korkut Al Tuna, O., & Arslan, F. M. (2016). Ölçek madde sayisinin cevaplayicilarin değerlendirmeleri ve veri karakteristiği üzerindeki  etkileri: 5’li ve 7 ‘li likert tipi ölçekler arasindaki farkliliklarin deneysel tasarim kullanarak incelenmesi [Impact of the number of scale points on data characteristics and respondents’ evaluations: An experimental design approach using 5-point and 7-point Likert-type scales]. İstanbul Üniversitesi Siyasal Bilgiler Fakültesi Dergisi, (55), 1–20. https://doi.org/10.17124/iusiyasal.320009

Kreitchmann, R. S., Abad, F. J., Ponsoda, V., Nieto, M. D., & Morillo, D. (2019). Controlling for response biases in self-report scales: Forced-choice vs. psychometric modeling of Likert items. Frontiers in Psychology, 10, Article 2309. https://doi.org/10.3389/fpsyg.2019.02309

Krosnick, J. A., & Holbrook, A. (2012). The impact of “no opinion” response options on data quality non-attitude reduction or an invitation to satisfice? Public Opinion Quarterly, 66(3), 371–403. https://doi.org/10.1086/341394

Kyriazos, T. A., & Stalikas, A. (2018). Applied psychometrics: The steps of scale development and standardization process. Psychology, 9(11), 2531–2560. https://doi.org/10.4236/psych.2018.911145

Lewis, J., & Erdinç, O. (2017). User experience rating scales with 7, 11, or 101 points: Does it matter? Journal of Usability Studies, 12(2), 73–91. https://bit.ly/3bTItIX

Likert, R. (1932). A technique for the measurement of attitudes. In R. S. Woodworth (Ed.), Archives of Psychology (Vol. 22, pp. 5–55). SAGE. https://bit.ly/3QngpLX

Lionello, M., Aletta, F., Mitchell, A., & Kang, J. (2021). Introducing a method for intervals correction on multiple Likert scales: A case study on an urban soundscape data collection instrument. Frontiers in Psychology, 11, Article 602831. https://doi.org/10.3389/fpsyg.2020.602831

Lozano, L. M., García-Cueto, E., & Muñiz, J. (2008). Effect of the number of response categories on the reliability and validity of rating scales. Methodology, 4(2), 73–79. https://doi.org/10.1027/1614-2241.4.2.73

Malone, H., Nicholl, H., & Tracey, C. (2014). Awareness and minimisation of systematic bias in research. British Journal of Nursing, 23(5), 279–282. https://doi.org/10.12968/bjon.2014.23.5.279

Martín, J. C., Román, C., & Gonzaga, C. (2018). How different n-point Likert scales affect the measurement of satisfaction in academic conferences. International Journal for Quality Research, 12(2), 421–440. https://doi.org/10.18421/IJQR12.02-08

Martins, L. E. G., & Gorschek, T. (2016). Requirements engineering for safety-critical systems: A systematic literature review. Information and Software Technology, 75, 71–89. https://doi.org/10.1016/j.infsof.2016.04.002

Mathes, T., Klaßen, P., & Pieper, D. (2017). Frequency of data extraction errors and methods to increase data extraction quality: A methodological review. BMC Medical Research Methodology, 17, Article 152. https://doi.org/10.1186/s12874-017-0431-4

Miles, M. B., Huberman, A. M., & Saldaña, J. (2014). Qualitative data analysis (3rd ed.). SAGE.

Mircioiu, C., & Atkinson, J. (2017). A comparison of parametric and non-parametric Methods applied to a Likert scale. Pharmacy, 5(4), 26–34. https://doi.org/10.3390/pharmacy5020026

Mishra, P., Pandey, C. M., Singh, U., & Gupta, A. (2018). Scales of measurement and presentation of statistical data. Annals of Cardiac Anaesthesia, 21(4), 419–422. https://doi.org/10.4103/aca.ACA_131_18

Mondiana, Y. Q., Pramoedyo, H., & Sumarminingsih, E. (2018). Structural equation modeling on Likert scale data with transformation by successive interval method and with no transformation. International Journal of Scientific and Research Publications, 8(5), 398–405. https://doi.org/10.29322/ijsrp.8.5.2018.p7751

Moors, G., Kieruj, N. D., & Vermunt, J. K. (2014). The effect of labeling and numbering of response scales on the likelihood of response bias. Sociological Methodology, 44(1), 369–399. https://doi.org/gg8hfw

Munn, Z., Tufanaru, C., & Aromataris, E. (2014). Data extraction and synthesis. American Journal of Nursing, 114(7), 49–54. https://doi.org/gqbxrm

Nadler, J. T., Weston, R., & Voyles, E. C. (2015). Stuck in the middle: The use and interpretation of mid-points in items on questionnaires. Journal of General Psychology, 142(2), 71–89. https://doi.org/gctm2x

Nemoto, T., & Beglar, D. (2014). Developing Likert-scale questionnaires. In N. Sonda & A. Krause (Eds.), JALT2013 Conference Proceedings (pp. 1–8). JALT. https://bit.ly/3AZZqKf

Onwuegbuzie, A. J., Leech, N. L., & Collins, K. M. T. (2012). Qualitative analysis techniques for the review of the literature. Qualitative Report, 17(28), 1–28. https://doi.org/gmtqn4

Pedder, H., Sarri, G., Keeney, E., Nunes, V., & Dias, S. (2016). Data extraction for complex meta-analysis (DECiMAL) guide. Systematic Reviews, 5, Article 212. https://doi.org/10.1186/s13643-016-0368-4

Pimentel, J. L. (2019). Some biases in Likert scaling usage and its correction. International Journal of Sciences: Basic and Applied Research, 45(1), 183–191. https://bit.ly/3PwBseJ

Popenoe, R., Langius-Eklöf, A., Stenwall, E., & Jervaeus, A. (2021). A practical guide to data analysis in general literature reviews. Nordic Journal of Nursing Research, 41(4), 175–186. https://doi.org/jbfb

Preston, C. C., & Colman, A. M. (2000). Optimal number of response categories in rating scales: Reliability, validity, discriminating power, and respondent preferences. Acta Psychologica, 104(1), 1–15. https://doi.org/dbcr2g

Sangwan, A., Sangwan, A., & Punia, P. (2021). Development and validation of an attitude scale towards online teaching and learning for higher education teachers. TechTrends, 65(2), 187–195. https://doi.org/gjgmqn

Schmidt, L., Olorisade, B. K., McGuinness, L. A., Thomas, J., & Higgins, J. P. T. (2021). Data extraction methods for systematic review (semi) automation: A living systematic review. F1000 Research, 10, Article 401. https://doi.org/jbfc

Selcuk, A. A. (2019). A guide for systematic reviews: PRISMA. Turkish Archives of Otorhinolaryngology, 57(1), 57–58. https://doi.org/10.5152/tao.2019.4058

Simms, L. J., Zelazny, K., Williams, T. F., & Bernstein, L. (2019). Does the number of response options matter? Psychometric perspectives using personality questionnaire data. Psychological Assessment, 31(4), 557–566. https://doi.org/10.1037/pas0000648

Sirganci, G., & Uyumaz, G. (2021). Determining the factors affecting the psychological distance between categories in the rating scale. International Journal of Contemporary Educational Research, 8(3), 178–190. https://doi.org/10.33200/ijcer.858599

Solimun, Fernandes, A. A. R., & Arisoesilaningsih, E. (2017). The efficiency of parameter estimation of latent path analysis using summated rating scale (SRS) and method of successive interval (MSI) for transformation of score to scale. AIP Conference Proceedings, 1913, Article 020037. https://doi.org/10.1063/1.5016671

Subedi, B. P. (2016). Using Likert type data in social science research: Confusion, issues and challenges. International Journal of Contemporary Applied Sciences, 3(2), 36–49. https://bit.ly/3q8AVWh

Sullivan, G. M., & Artino, A. R. (2013). Analyzing and interpreting data from Likert-type scales. Journal of Graduate Medical Education, 5(4), 541–542. https://doi.org/10.4300/jgme-5-4-18

Taherdoost, H. (2016). Validity and reliability of the research instrument: How to test the validation of a questionnaire/survey in a research. International Journal of Academic Research in Management, 5(3), 28–36. https://doi.org/10.2139/ssrn.3205040

Taherdoost, H. (2019). What is the best response scale for survey and questionnaire design: Review of different lengths of rating scale / attitude scale / Likert scale. International Journal of Academic Research in Management, 8(1), 1–10. https://bit.ly/3Be4KL7

Thomas, J., & Harden, A. (2008). Methods for the thematic synthesis of qualitative research in systematic reviews. BMC Medical Research Methodology, 8, Article 45. https://doi.org/10.1186/1471-2288-8-45

Thorpe, G. L., & Favia, A. (2016). Data analysis using item response theory methodology: An introduction to selected programs and applications. Psychology Faculty Scholarship, 20, 1-33. https://bit.ly/3RcMg39

Tijmstra, J., Bolsinova, M., & Jeon, M. (2018). General mixture item response models with different item response structures: Exposition with an application to Likert scales. Behavior Research Methods, 50(6), 2325–2344. https://doi.org/10.3758/s13428-017-0997-0

Ulia, N., & Kusmaryono, I. (2021). Mathematical disposition of students’, teachers, and parents in distance learning: A survey. Premiere Educandum : Jurnal Pendidikan Dasar Dan Pembelajaran, 11(1), 147–159. https://doi.org/10.25273/pe.v11i1.8869

Warmbrod, J. R. (2014). Reporting and interpreting scores derived from Likert-type scales. Journal of Agricultural Education, 55(5), 30–47. https://doi.org/10.5032/jae.2014.05030

Xiong, C., Ceja, C. R., Ludwig, C. J. H., & Franconeri, S. (2020). Biased average position estimates in line and bar graphs: Underestimation, overestimation, and perceptual pull. IEEE Transactions on Visualization and Computer Graphics, 26(1), 301–310. https://doi.org/10.1109/TVCG.2019.2934400

Zanon, C., Hutz, C. S., Yoo, H., & Hambleton, R. K. (2016). An application of item response theory to psychological test development. Psicologia: Reflexao e Critica, 29(18), 1–10. https://doi.org/10.1186/s41155-016-0040-x

Zhang, Y., Xu, Q., Lao, J., & Shen, Y. (2021). Reliability and validity of a chinese version of the stem attitude scale for primary and secondary school students. Sustainability, 13(22), Article 12661. https://doi.org/10.3390/su132212661

Zumsteg, J. M., Cooper, J. S., & Noon, M. S. (2012). Systematic review checklist: A standardized technique for assessing and reporting reviews of life cycle assessment data. Journal of Industrial Ecology, 16(1), 12–21.  https://doi.org/10.1111/j.1530-9290.2012.00476.x

...