logo logo International Journal of Educational Methodology

IJEM is a leading, peer-reviewed, open access, research journal that provides an online forum for studies in education, by and for scholars and practitioners, worldwide.

Subscribe to

Receive Email Alerts

for special events, calls for papers, and professional development opportunities.

Subscribe

Publisher (HQ)

RHAPSODE LTD
Eurasian Society of Educational Research
College House, 2nd Floor 17 King Edwards Road, Ruislip, London, UK. HA4 7AE
RHAPSODE LTD
Headquarters
College House, 2nd Floor 17 King Edwards Road, Ruislip, London, UK. HA4 7AE

' item measurement' Search Results



...

Assessment for Learning (AfL) may be conceptualized as minute-to-minute, day-by-day interactions between learners and teachers with the improvement of learning as the principal focus. This paper traces the development of an AfL measurement instrument (scale) that can be used for research purposes prior to, during and following professional development in the area. Rasch measurement procedures were applied to data drawn from a convenience sample of 594 teachers from 44 elementary schools in Ireland to create a scale consisting of 20 items distributed across four key AfL assessment strategies: learning intentions and success criteria, questioning and classroom discussion, feedback, and peer-and self-assessment.  This scale, the Assessment for Learning Measurement instrument (AfLMi), has good psychometric properties and is interpretable in a way that makes it potentially useful during system wide improvement initiatives focused on AfL.

description Abstract
visibility View cloud_download PDF
10.12973/ijem.3.2.103
Pages: 103-115
cloud_download 1423
visibility 1710
5
Article Metrics
Views
1423
Download
1710
Citations
Crossref
5

Scopus

...

This research was conducted to investigate the predictive role of homophobia and unconditional self-acceptance on respect of differences in psychological counselor candidates. Participants were 239 psychological counselor candidates. The Respect of Differences Scale, the Homophobia Scale, and the Unconditional Self-Acceptance Scale were used to collect the data. Path analysis was used to determine the influences of variables on respect of differences. The independent sample t-test and one-way ANOVA were used to determine differences between participants in terms of gender and grade. The results of the analysis indicated that homophobia and unconditional self-acceptance are predictors of respect of differences, and place of living and traditionally have an indirect effect on respect of differences. In addition, female participants reported a higher level of respect of differences than male participants. Similarly, first year college students reported a higher level of respect of differences than fourth year college students.

description Abstract
visibility View cloud_download PDF
10.12973/ijem.5.1.59
Pages: 59-70
cloud_download 499
visibility 888
2
Article Metrics
Views
499
Download
888
Citations
Crossref
2

Scopus

...

The aim of this study is to compare 2018 Science Course Curriculum (SCC), 2015 Trends in International Mathematics and Science Study (TIMSS) and 2018 High School Entrance Examination (HSE) in terms of content domains, cognitive domains and learning objectives. Qualitative research method, was used in this study. Data were analyzed using document review matrices to determine the similarities and differences between the objectives of SCC, TIMSS and HSE. SCC outcomes and HSE science questions were also classified according to TIMSS cognitive domains. Results show that the learning objectives of the fields of Physics, Biology and Earth Sciences of TIMSS are compatible with those of all grade levels of SCC and that the objectives of Chemistry are compatible with those of the seventh and eighth grades. Most of HSE questions are compatible with the objectives of SCC, however, the latest revision in the curriculum has introduced some eighth grade objectives to other grade levels. HSE science questions measure higher-level skills than TIMSS science questions. The subject domain of the “Organisms and Life” of SCC has the most learning objectives in the levels of “knowing” and “reasoning” while the subject domain of the “Physical Events” has the most learning objectives in the levels of “applying.” Besides, the seventh-, fifth- and eighth-graders have the most objectives in the levels of “knowing,” “applying,” and “reasoning,” respectively. It is hoped that the results will contribute the literature in improvement of science curricula and interpretation of national and international exams.

description Abstract
visibility View cloud_download PDF
10.12973/ijem.5.3.433
Pages: 433-449
cloud_download 849
visibility 1117
2
Article Metrics
Views
849
Download
1117
Citations
Crossref
2

Scopus

...

Teacher-made tests (TMT) are the most used instruments for assessment and evaluation. This study investigates the cognitive requirements, test construction errors, and item types of TMTs. Content analysis technique is used in order to analyze and classify TMT items based on TIMSS-2019 assessment framework and based on criteria that is constructed to determine test construction errors. The data is consisted of 548 items in 30 exam papers of 18 mathematics teachers from 13 distinct schools. The distribution of TIMSS-2019 cognitive demands of all TMTs indicates that there is a strong emphasis on knowing or applying cognitive domains, with a total percentage of 93. Since 83% of all questions are of multiple choice and 17% are constructed-response type, teachers mostly prefer multiple choice item type. Findings also reveal that except face validity, there are errors concerning test constructions. Consequently, it is suggested that teachers should give more care on preparing items of higher cognitive levels, on tests of mixed type items, and on tests that involve lesser construction errors for more reliable tests. Finally, it is also suggested that measurement and evaluation specialists should be employed in each school or in each local Ministry of National Education Authority at least, in order to support teachers, but if this is not possible in a close time, there must be in-service training programs on measurement and evaluation for teachers to participate in.

description Abstract
visibility View cloud_download PDF
10.12973/ijem.5.3.479
Pages: 479-488
cloud_download 349
visibility 755
3
Article Metrics
Views
349
Download
755
Citations
Crossref
3

Scopus

...

This study describes the development and validation of a psychometrically-sound instrument, the Active Learning Strategies Inventory (ALSI), designed to measure learners’ perceptions of their active learning strategies within an active learning context. Active learning encompasses a broad range of pedagogical practices and instructional methods that connect with an individual learner's active learning strategies. In order to fulfill the study's goals, a conceptual framework on learners’ active learning strategies was developed and proposed, drawing upon the research literature on active learning. The development and construct validation of the Active Learning Strategies Inventory (ALSI), based on the conceptual and methodological underpinnings, involved identifying five scales of learners’ active learning strategies: engagement, cognitive processing, orientation to learning, readiness to learn and motivational orientation. An item pool of 20 items was generated following an extensive review of the literature, standardized card sorting procedures including confirmatory factor analysis and scale validation of a pilot (n = 407) survey. The ALSI scale demonstrated strong internal consistency and reliability with a Cronbach's alpha ranging from 0.81 to 0.87. High item loading scores from the factor analysis provided initial support for the instrument's construct validity of the five-factor model. The ALSI scale provides a reliable and valid method for researchers and academicians who wish to measure learners' perceptions of their active learning strategies within an active learning context. Finally, we discuss the implications and address the limitations and directions for future research.

description Abstract
visibility View cloud_download PDF
10.12973/ijem.7.1.201
Pages: 201-223
cloud_download 1011
visibility 1209
5
Article Metrics
Views
1011
Download
1209
Citations
Crossref
5

Scopus
4

...

The purpose of this study is to examine the mediator role of cognitive flexibility and difficulties in emotion regulation in the relationship between resilience and distress tolerance amongst college students. The sample of the study involved 1114 students (771 females, 343 males) from various universities in Turkey. The mean age of the sample was 20.65 (Sd=2.77). The Resilience Scale, Distress Tolerance Scale, Cognitive Flexibility Scale, and Difficulties in Emotion Regulation Scale (DERS) had been used to collect data. In this study, a Serial Multiple Mediation Model was used, as proposed by Hayes. The findings showed that people with a higher level of distress tolerance possess higher degrees of cognitive flexibility and that cognitively more flexible individuals experience less difficulty in emotion regulation, and thus, lower levels of difficulty in emotion regulation were associated with an increase in resilience. Furthermore, the model in its entirety had proven to be statistically significant, accounting for 42% of the total variance.

description Abstract
visibility View cloud_download PDF
10.12973/ijem.5.4.525
Pages: 525-533
cloud_download 3601
visibility 4348
41
Article Metrics
Views
3601
Download
4348
Citations
Crossref
41

Scopus
33

...

Pearson product–moment correlation coefficient between item g and test score X, known as item–test or item–total correlation (Rit), and item–rest correlation (Rir) are two of the most used classical estimators for item discrimination power (IDP). Both Rit and Rir underestimate IDP caused by the mismatch of the scales of the item and the score. Underestimation of IDP may be drastic when the difficulty level of the item is extreme. Based on a simulation, in a binary dataset, a good alternative for Rit and Rir could be the Somers’ D: it reaches the ultimate values +1 and –1, it underestimates IDP remarkably less than Rit and Rir, and, being a robust statistic, it is more stable against the changes in the data structure. Somers’ D has, however, one major disadvantage in a polytomous case: it tends to underestimate the magnitude of the association of item and score more than Rit does when the item scale has four categories or more.

description Abstract
visibility View cloud_download PDF
10.12973/ijem.6.1.207
Pages: 207‒221
cloud_download 1098
visibility 1397
16
Article Metrics
Views
1098
Download
1397
Citations
Crossref
16

Scopus

...

Kelley’s Discrimination Index (DI) is a simple and robust, classical non-parametric short-cut to estimate the item discrimination power (IDP) in the practical educational settings. Unlike item–total correlation, DI can reach the ultimate values of +1 and ‒1, and it is stable against the outliers. Because of the computational easiness, DI is specifically suitable for the rough estimation where the sophisticated tools for item analysis such as IRT modelling are not available as is usual, for example, in the classroom testing. Unlike most of the other traditional indices for IDP, DI uses only the extreme cases of the ordered dataset in the estimation. One deficiency of DI is that it suits only for dichotomous datasets. This article generalizes DI to allow polytomous dataset and flexible cut-offs for selecting the extreme cases. A new algorithm based on the concept of the characteristic vector of the item is introduced to compute the generalized DI (GDI). A new visual method for item analysis, the cut-off curve, is introduced based on the procedure called exhaustive splitting.

description Abstract
visibility View cloud_download PDF
10.12973/ijem.6.2.237
Pages: 237 - 258
cloud_download 836
visibility 1029
6
Article Metrics
Views
836
Download
1029
Citations
Crossref
6

Scopus

...

A new index of item discrimination power (IDP), dimension-corrected Somers’ D (D2) is proposed. Somers’ D is one of the superior alternatives for item–total- (Rit) and item–rest correlation (Rir) in reflecting the real IDP with items with scales 0/1 and 0/1/2, that is, up to three categories. D also reaches the extreme value +1 and ‒1 correctly while Rit and Rir cannot reach the ultimate values in the real-life testing settings. However, when the item has four categories or more, Somers’ D underestimates IDP more than Pearson correlation. A simple correction to Somers’ D in the polytomous case seems to lead to be effective in item analysis settings.  In the simulation with real-life items, D2 showed very few cases of obvious underestimation and practically no cases of obvious overestimation. With certain restrictions discussed in the article, D2 seems to be a good alternative for these classic estimators not only with dichotomous items but also with the polytomous ones. In general, the magnitudes of the estimates by D2 are higher than those by Rit, Rir, and polychoric correlation and they seem to be close of those of bi- and polyserial correlation coefficients without out-of-range values.

description Abstract
visibility View cloud_download PDF
10.12973/ijem.6.2.297
Pages: 297-317
cloud_download 370
visibility 834
8
Article Metrics
Views
370
Download
834
Citations
Crossref
8

Scopus

...

Progress monitoring of academic achievement is an essential element to prevent learning disorders. A prominent approach is curriculum-based measurement (CBM). Various studies have documented positive effects of CBM on students’ achievement. Nevertheless, the use of CBM is associated with additional work for teachers. The use of tablets may be of help here. Yet, although many advantages of computer- or tablet-based assessments are being discussed in the literature (e. g. innovative item formats, adaptive testing, automated scoring and feedback), there are still concerns regarding the comparability of different assessment modes (paper-pencil vs. tablet). In the study presented, we analyze the CBM data of 98 fourth graders. They processed the exact same computation items once with paper and pen and once in a tablet application. The analyses point to comparable results in the test modes, although some significant deviations can be found at item level. In addition, the children report perceived benefits when working with the tablet.

description Abstract
visibility View cloud_download PDF
10.12973/ijem.6.4.669
Pages: 669-680
cloud_download 1035
visibility 1194
13
Article Metrics
Views
1035
Download
1194
Citations
Crossref
13

Scopus

...

Although Goodman–Kruskal gamma (G) is used relatively rarely it has promising potential as a coefficient of association in educational settings.  Characteristics of G are studied in three sub-studies related to educational measurement settings. G appears to be unexpectedly appealing as an estimator of association between an item and a score because it strictly indicates the probability to get a correct answer in the test item given the score, and it accurately produces perfect latent association irrespective of distributions, degrees of freedom, number of tied pairs and tied values in the variables, or the difficulty levels in the items. However, it underestimates the association in an obvious manner when the number of categories in the item is more than four. Towards this, a dimension-corrected G (G2) is proposed and its characteristics are studied. Both G and G2 appear to be promising alternatives in measurement modelling settings, G with binary items and G2 with binary, polytomous and mixed datasets.

description Abstract
visibility View cloud_download PDF
10.12973/ijem.7.1.95
Pages: 95-118
cloud_download 874
visibility 842
9
Article Metrics
Views
874
Download
842
Citations
Crossref
9

Scopus

...

This study aims to produce empirical evidence of the validity and reliability of instrument items for the competency framework of agricultural teaching staff in Malaysian agricultural vocational colleges. The validity and reliability of the framework were analyzed using Rasch Model Measurement assisted by Winsteps 3.72 software. This research instrument contained 116 items, which was distributed to 30 instructors at the Teluk Intan Agricultural Vocational College, Malaysia. The selection of respondents was made by strata random where the researcher makes the strata of the population according to the percentage and then selects randomly based on the desired percentage. Validity analysis of the instrument was done through four functional testings. For reliability and separation of respondents, it was found that the individual reliability value was very good and acceptable. The results of the item polarity analysis detected no negative value (-) in the Point Measure Correlation value. Item matching analysis found that 11 items had to be dropped as they failed to meet the required conditions. From the analysis on local dependence that determines dependent items based on the standardized residual correlation value, it was discovered that the correlation value for the items used was detected; 13 items need attention. The results of the data analysis checking the functionality of the items suggested that some items should be dropped. The omission of these items has provided evidence that the instrument of competence of agricultural instructors is crucial to have a high level of validity and reliability for use in actual studies.

description Abstract
visibility View cloud_download PDF
10.12973/ijem.7.3.411
Pages: 411-420
cloud_download 323
visibility 570
2
Article Metrics
Views
323
Download
570
Citations
Crossref
2

Scopus
2

...

This study reviews 60 papers using a Likert scale and published between 2012 – 2021. Screening for literature review uses the PRISMA method. The data analysis technique was carried out through data extraction, then synthesized in a structured manner using the narrative method. To achieve credible research results at the stage of the data collection and data analysis process, a group discussion forum (FGD) was conducted. The findings show that only 10% of studies use a measurement scale with an even answer choice category (4, 6, 8, or 10 choices). In general, (90%) of research uses a measurement instrument that involves a Likert scale with odd response choices (5, 7, 9, or 11) and the most popular researchers use a Likert scale with a total response of 5 points. The use of a rating scale with an odd number of responses of more than five points (especially on a seven-point scale) is the most effective in terms of reliability and validity coefficients, but if the researcher wants to direct respondents to one side, then a scale with an even number of responses (six points) is possible. more suitable. The presence of response bias and central tendency bias can affect the validity and reliability of the use of the Likert scale instrument.

description Abstract
visibility View cloud_download PDF
10.12973/ijem.8.4.625
Pages: 625-637
cloud_download 1382
visibility 2391
10
Article Metrics
Views
1382
Download
2391
Citations
Crossref
10

Scopus
4

Rethinking the Components of Regulation of Cognition through the Structural Validity of the Meta-Text Test

metacognition performance-based testing regulation of cognition structural validity

Marcio Alexander Castillo-Diaz , Cristiano Mauro Assis Gomes , Enio Galinkin Jelihovschi


...

The field of studies in metacognition points to some limitations in the way the construct has traditionally been measured and shows a near absence of performance-based tests. The Meta-Text is a performance-based test recently created to assess components of cognition regulation: planning, monitoring, and judgment. This study presents the first evidence on the structural validity of the Meta-Text, by analyzing its dimensionality and reliability in a sample of 655 Honduran university students. Different models were tested, via item confirmatory factor analysis. The results indicated that the specific factors of planning and monitoring do not hold empirically. The bifactor model containing the general cognition regulation factor and the judgment-specific factor was evaluated as the best model (CFI = .992; NFI = .963; TLI = .991; RMSEA = .021). The reliability of the factors in this model proved to be acceptable (Ω = .701 & .699). The judgment items were well loaded only by the judgment factor, suggesting that the judgment construct may actually be another component of the metacognitive knowledge dimension but having little role in cognition regulation. The results show initial evidence on the structural validity of the Meta-Text and give rise to information previously unidentified by the field which has conceptual implications for theorizing metacognitive components.

description Abstract
visibility View cloud_download PDF
10.12973/ijem.8.4.687
Pages: 687-698
cloud_download 446
visibility 770
2
Article Metrics
Views
446
Download
770
Citations
Crossref
2

Scopus
1

Graded Response Models on the Curiosity Measurement of Elementary School Students

curiosity measurement elementary school graded response models

Herwin Herwin , Riana Nurhayati , Aprilia Tina Lidyasari , Augusto da Costa


...

Curiosity is one of the most important characters for elementary school students. However, the facts in the field show that the measurement model used by the teacher to identify the student's curiosity is not yet available in a standardized manner. This study aims to develop a model for measuring the curiosity of elementary school students using the graded response model (GRM) approach. This research uses quantitative method with descriptive type. The research sample used was 236 elementary school students who were randomly selected. Data were collected using a questionnaire of 16 statement items using a Likert scale approach. The data were analyzed using the response item theory approach with the GRM. The results showed that the model for measuring student curiosity in elementary schools had good location parameters, a good discriminant index, a fairly good information function with a small estimation error. The curiosity measurement model in this study can be used as an alternative for teachers to identify students' curiosity in elementary schools.

description Abstract
visibility View cloud_download PDF
10.12973/ijem.9.1.53
Pages: 53-62
cloud_download 313
visibility 571
0
Article Metrics
Views
313
Download
571
Citations
Crossref
0

Scopus
0

...

The purpose of this systematic literature review (SLR) is to identify: (a) the topic of the study, (b) the research methods used, and (c) the results of research on Mathematics education in Malaysia. This study discusses the use of teaching aid (TA) in the field of syllabus and geometry for Form 2 students. The use of TA is considered highly successful and relevant for educators to improve the quality of the teacher’s instructions and students’ understanding. Therefore, using the rules of optional reporting items for systematic review and meta-analysis (PRISMA) by Moher et al. (2015), a review system was carried out to determine the appropriate strategies and variables for the field. Four stages constitute the PRISMA paradigm used in this study: identification, screening, qualification, and admission. Using criteria opted by researchers from multiple searches, including Google Scholar, Researchgate, Scopus, and Emerald, over 20 papers were identified for additional investigation. The data were then analysed quantitatively to describe the research's findings. From the results, two main research themes were found, namely (a) learning to use TA; and (b) the field of measurement and geometry of Mathematics. The results of the article analysis indicate that Mathematics education in Malaysia is currently at a moderate level and is ineffective at fostering students' understanding and interest. These results are anticipated to serve as the foundation for teachers, students, schools, and the Ministry of Education to undertake more engaging and interactive learning, particularly in the subject areas of mathematics and geometry.

description Abstract
visibility View cloud_download PDF
10.12973/ijem.9.2.387
Pages: 387-396
cloud_download 213
visibility 520
0
Article Metrics
Views
213
Download
520
Citations
Crossref
0

Scopus
0

Validation of the Adolescent Social Identity Measure: Adolescents’ Perception of Themselves in a Social Context

adolescents confirmatory factor analysis social identity validation

Annemaree Carroll , Julie M. Bower , Jenny Povey , Sandy Muspratt , Holly Chen


...

Social identity is an important social determinant of student outcomes such as mental health and well-being. Currently, no validated social identity measures exist for adolescents in secondary school settings. A new ‘Adolescent Social Identity’ measure was developed by adapting two social identity dimensions from a validated reputation enhancement scale. The Social Identity Measure comprises two scales of 10 items each to measure how adolescents think their peers view them (e.g., reputational status) in terms of their conforming and nonconforming behaviour (Self-perception of Public Self) and how adolescents would ideally like to be viewed (Ideal Public Self) by peers. Exploratory and confirmatory factor analyses were conducted along with assessments of reliability, validity, and measurement invariance. Conforming and Nonconforming subscales for both scales were shown to be reliable, valid, and invariant across age and gender groupings. There were significant but small differences in the latent means for gender.

description Abstract
visibility View cloud_download PDF
10.12973/ijem.9.3.551
Pages: 551-565
cloud_download 217
visibility 391
0
Article Metrics
Views
217
Download
391
Citations
Crossref
0

Scopus
0

...

The role of artificial intelligence (AI) in education remains incompletely understood, demanding further evaluation and the creation of robust assessment tools. Despite previous attempts to measure AI's impact in education, existing studies have limitations. This research aimed to develop and validate an assessment instrument for gauging AI effects in higher education. Employing various analytical methods, including Exploratory Factor Analysis, Confirmatory Factor Analysis, and Rasch Analysis, the initial 70-item instrument covered seven constructs. Administered to 635 students at Nueva Ecija University of Science and Technology – Gabaldon campus, content validity was assessed using the Lawshe method. After eliminating 19 items through EFA and CFA, Rasch analysis confirmed the construct validity and led to the removal of three more items. The final 48-item instrument, categorized into learning experiences, academic performance, career guidance, motivation, self-reliance, social interactions, and AI dependency, emerged as a valid and reliable tool for assessing AI's impact on higher education, especially among college students.

description Abstract
visibility View cloud_download PDF
10.12973/ijem.10.2.997
Pages: 197-211
cloud_download 73
visibility 212
0
Article Metrics
Views
73
Download
212
Citations
Crossref
0

Scopus
0

...