Tablet or Paper and Pen? Examining Mode Effects on German Elementary School Students’ Computation Skills with Curriculum-Based Measurements

: Progress monitoring of academic achievement is an essential element to prevent learning disorders. A prominent approach is curriculum-based measurement (CBM). Various studies have documented positive effects of CBM on students’ achievement. Nevertheless, the use of CBM is associated with additional work for teachers. The use of tablets may be of help here. Yet, although many advantages of computer- or tablet-based assessments are being discussed in the literature (e. g. innovative item formats, adaptive testing, automated scoring and feedback), there are still concerns regarding the comparability of different assessment modes (paper-pencil vs. tablet). In the study presented, we analyze the CBM data of 98 fourth graders. They processed the exact same computation items once with paper and pen and once in a tablet application. The analyses point to comparable results in the test modes, although some significant deviations can be found at item level. In addition, the children report perceived benefits when working with the tablet. a tablet-based the (no additional financial and time expenditure for and less pressure solving computation tasks for students), with comparable results for students.


Progress Monitoring of academic skills with Curriculum-Based Measurements
The repeated collection of student performance data has established itself in Germany as an important and independent field of research (e.g. Voß & Gebhardt, 2017). The term progress monitoring covers all those instruments and methods that aim at a formative evaluation of instruction and intervention in school contexts and have been sufficiently tested regarding their psychometric suitability (Klauer, 2014). The aim of this approach is to map and document learning processes to detect learning deficits as early as possible, to draw conclusions about effective pedagogical interventions and to implement data-based decisions. Progress monitoring is thus an essential element of school-based prevention of learning disorders in Germany (Hartke, 2005;Huber & Grosche, 2012;Kretschmann, 2009) and also forms a basis for evidence-based pedagogical action in individual cases .
A very prominent approach to the diagnosis of learning trajectories can be seen in the so-called curriculum-based measurements (CBM; Deno, 1985Deno, , 2003Fuchs & Fuchs, 1986;Hosp et al., 2007). CBM are short tests which are intended to map learning progress in individual specific competencies in different learning areas (Stecker et al., 2008). Originally, CBM were developed in the USA in the 1970s as an evaluation tool to assess the effectiveness of special education measures. The special feature of the procedures is the equivalent combination of their psychometric qualityto ensure accurate and valid data -and their simple implementation -in the sense of simple application and evaluation as well as economic implementation -to ensure high-frequency use in educational practice (Deno, 2003).
Against the background of the previously reported findings on the effectiveness of CBM, especially in connection with digitally supported processing of student data, and the advantages and availability (at least in the private sphere) of modern digital media, the question arises as to which extent do digitally collected results differ from those collected as paper-pencil-based versions (so-called mode effects; Hensley, 2015;Huff & Sireci, 2001).

Literature Review
While the international state of research shows great potential for the use of CBM in the school field, there is currently only a small and slowly growing range of procedures in the German-speaking world (Hesse & Latzko, 2017;Voß & Gebhardt, 2017). This contradiction is probably primarily to be seen in the multitude of challenges in the conception and evaluation of progress monitoring instruments. In addition to the required evidence for common status diagnostic instruments (objectivity, reliability, and validity), further requirements apply to progress monitoring instruments. This is to ensure that the results can be interpreted correctly over time. According to Fuchs (2004), in addition to psychometric suitability, aspects relevant to practice are particularly important. The applicability, usefulness, and effectiveness of CBM in school practice are of importance here.
While various ways of constructing and evaluating instruments have already been described in the German literature (Gebhardt et al., 2015;Wilbert & Linnemann, 2011), the usability and applicability of CBM is currently receiving particular attention. This is linked to questions of the successful implementation of progress monitoring tools in school practice (Hebbecker & Souvignier, 2018;Voß, 2014). In this context, the significance of modern digital media in the school setting is discussed (Fuchs, 2004).
A computer-/tablet-based progress monitoring tool offers many advantages for teachers (see table 1). On the one hand, the frequent use of printed CBM is very cost-intensive, which is problematic in the context of limited printing quotas in schools. In addition, the repeated use of CBM requires time to collect, evaluate and interpret the data. In a survey of German teachers conducted by Voß (2014), the participants rated the regular use of paper & pencil CBM as a significant (although justifiable against the background of the gain in knowledge) additional expenditure. A further problem area is the possible restriction of consistency in evaluation within the school staff which could lead to a limited objectivity of evaluation and interpretation (Fuchs et al., 1994).
Digital assessments (whether with the computer or the tablet) lead to a higher time-efficiency. Especially data evaluation, presentation, and documentation, which would be the task of teachers, can thus be automated and run in a much more resource-efficient manner (Fuchs, 2004). In addition, it is much easier to provide support for data interpretation.
In Germany, some websites are available to support the progress monitoring of students. The website www.quop.de , has been extensively tested and evaluated. The present research shows that computer-based tests are very well accepted by practitioners and lead to higher learning gains (Förster & Souvignier, 2014;Souvignier et al., 2014). Previous experience shows similar tendencies for the internet platforms www.levumi.de (Mühling et al., 2017) and www.lernlinie.de (Voß et al., 2020).

Table 1. Arguments for and against paper-pen-based and tablet-based assessment
Arguments for a paper-pen-based approach Arguments for a tablet-based approach  Familiar working tool in schools  Low availability of tablets in schools  Continuous maintenance for tablets is necessary  Cost-effective acquisition of current devices is possible  Intuitive work  Positive impact on children with learning disabilities (Genz & Bresges, 2017)  Increased motivation of students (Bremer &Tillmann, 2014)  Use of less resources compared to paper-based work  Innovative task format, adaptive assessment, and automated feedback possible (Bridgeman, 2009;Redecker & Johannessen, 2013)  Efficacy of computer-based data analysis (Fuchs et al., 1988(Fuchs et al., , 1989 The question of mode effects (here the influence of different administrative conditions on the results of digital as opposed to analogue tests) is important in order to assess whether the results obtained can be interpreted as equivalent (American Educational Research Association, 2014). To test for these differences, the two modes of implementation must be subjected to an empirical comparison. The findings indicate comparability when the results and distributions of students are similar across both formats (Pomplun et al., 2002).
There is already a comprehensive body of research on mode effects in paper-pencil-based and computer-based tests in general (Bennett et al., 2008;Cayton-Hodges et al., 2015;Hensley, 2015;Huff & Sireci, 2001;Poggio et al., 2005;Pommerich, 2004;Wang, et al., 2007). However, the research findings are not uniform (Hensley, 2015). Some studies show that the mode of administration has little or no impact on student performance (Hargreaves et al., 2004;Peak, 2005;Wang et al., 2007), while others point to advantages for paper-pencil-based testing (Bennett, 2003). There are also studies that show better performance in computer-based procedures (Clariana & Wallace, 2002;Pomplun et al., 2002). There are also limitations in the findings in that many studies refer to higher grades or adulthood, that different test areas (mathematics skills, reading or writing skills) and test formats were investigated, that PCs were generally used but not mobile devices, that most comparative studies did not investigate mode effects at item level but only at the overall test level, and that the use of CBM was only occasionally specifically investigated in these studies. Against this background, the questions relevant to the present paper arise.

Research Goal
The presented study aims to expand the given research corpus. For this purpose, results of primary school children will be analyzed comparatively in a thoroughly evaluated CBM for the assessment of addition and subtraction competencies of primary school children ) in a paper-pencil-based and a tablet-based test. The following questions guide the research: 1. Can paper-pencil-based CBM also be implemented as a tablet version (applicability)?
2. Are there differences in the results of paper-pencil-based and tablet-based CBM procedures (mode effects)?
3. Does the tablet-based implementation have advantages over the paper-pencil-based version (advantages of digital media)?

Sample and Data Collection
A total of N = 98 children from five fourth grades took part in the study. The gender ratio was approximately balanced with 54.1 % girls (N = 53). Disease-related data losses of individual children were recorded at individual test times. Due to data protection objections of the participating schools and parents, no personal analysis of the origin and socioeconomic status could be carried out.
As dependent variable the results of the children in the addition and subtraction CBM of Sikora and Voß (2017) were used. To construct the CBM, clustered item pools were generated for each item position of a test according to mathematical-didactic as well as empirical aspects (N = 24). From these, tasks were randomly selected and transferred to the corresponding test versions. In this way, ten different but structurally similar CBM versions could be created for the fourth school year. A comprehensive longitudinal study (N = 463 primary school students) showed the psychometric and diagnostic suitability of the method . According to this, all items meet the requirements of a one-dimensional Rasch model (0.7 ≤ Infit ≤ 1.3; Bond & Fox, 2015) and show satisfactory item difficulties and selectivity. Reliability (Cronbach's α = .86; range from .84 -.88) as well as validity coefficients (significant correlation of r = .64 with standardized mathematics test; significant correlation r = .31 with standardized reading and spelling test) are also adequate. The sensitivity to change of the developed CBM could be shown by means of multi-level modelling .
In addition to the CBM, the children completed a questionnaire. The questionnaire contained a total of 21 items relating to the scales "use of the tablet", "motivation to use the tablet", "(perceived) advantage of using the tablet", "motivation to use the tablet in general", "use of the tablet at home", "attitude towards mathematics" and "use of the tablet at school" (see Table 2). The items were presented as ratings on a 4-level Likert scale with the extreme values "I fully agree" (1) and "I do not agree" (4). In the context of the study presented here, only selected findings from the student survey are discussed in detail. In addition to the data listed above, teachers were also asked to provide the half-yearly grades for each child.
The tests were administered in the middle of the fourth school year within one school week as part of a daily exercise. On two consecutive days, the versions for addition were used first in paper-pencil-based or tablet-based form. One part of the class received the paper version, the other part the tablet version. On the next test day this was exchanged accordingly. On the following two school days, the versions for subtraction were used in the same way.

Analyzing of Data
The Shapiro-Wilk-test reveals that none of the data is normally distributed (addition paper & pencil: W(74) = .502, p < .001; addition tablet: W(74) = .486, p < .001; subtraction paper & pencil: W(74) = .425, p < .001; subtraction tablet: W(74) = .367, p < .001;). For data analysis, correlations between the results of the paper-pencil-based and tablet-based tests were calculated -separately for the areas of addition and subtraction. Using a graphical model test as well as the inferential statistical analysis of the different test modes, differences in the results were checked for. A comparison of the reliability of the paper-pencil-based and tablet-based tests and their correlations with the semi-annual score was determined as a further value for assessing possible mode effects (Hensley, 2015).

Results at the Overall Test Level
The comparison of the distributions of the paper-pencil-based vs. tablet-based test values does not show any significant differences (see Table 3). This applies to both the addition and subtraction CBM. This assumption is further supported by a graphical estimation of measurement invariance. The items were plotted on the X-and Y-axes separately according to their implementation mode (see Figure 1). Since the item parameters run along the bisectors, it can be assumed that the test mode is measurement invariant. Inference-statistically, no significant differences can be shown either in the areas of addition or subtraction (see Table 3).

Figure 1. Graphic model test on measurement invariance for the CBM addition (left) and subtraction (right)
According to Cohen (1988), the correlations between the test modes are high (see Table 3), but significantly lower for the addition CBM (r = .63, N = 87) than for the subtraction CBM (r = .84, N = 85) (z-test over the correlation coefficients: z = -3.09, p = .001). The correlations with the half-year grade in mathematics as an external criterion indicate negligible differences (addition: tablet r = -0.60 vs. paper-pencil r = -0.59; subtraction: tablet r = -0.53 vs. paper-pencil r = -0.58; z-test over the correlation coefficients: z = 0.46, p = .032). The correlations are negative since the evaluation of the raw scores in the CBM and the grades are opposite.

Results on Item Level
The proportion of matching results between the two test modes for each item is on average m = 77.1 % (sd = 12.4 %) for the addition range. For item 22, for example, the agreement is only 52.6 % (see Table 4). The task is to solve 29600 + 53700 = ____. The distribution of correct and incorrect solutions for the item is shown in Table 4. According to this, the paper-pencil form is more likely to provide a correct solution. The highest percentage of identical results for the addition is found for item 2 (97.7 %) (see Table 5). Similar values are found for the subtraction area (m = 80.1 %, sd = 8.6 %, min = 65.0 %, max = 95.0 %).

Results of the Student Survey
The student survey indicates that the use of tablets at school is rather rare. The statement "At school I often work on the tablet." was rated an average of m = 3.51 (sd = 0.85; on a 4-level Likert scale) by the children. Nevertheless, the handling of the tablet-based test in class seems realistic. The children stated that they found it easy to handle the tablet in the test situation (m = 1.21, sd = 0.47) and that they could see everything well (m = 1.31, sd = 0.65), problems were only reported sporadically (m = 3.47, sd = 0.94).
The answers of the students indicate a tendency that the tablet-based test is perceived as more advantageous. While the distribution of the answers to the questions according to the difficulty of the tasks on the tablet is very balanced (see Figure 2, left), the majority of the respondents assume that they can solve the tasks more quickly with the tablet (see Figure 2, right).

Figure 2. Distribution of answers regarding the aspect "(perceived) advantages of using tablets"
A differentiated consideration of the distribution of responses to the aspect "(perceived) advantage of using a tablet" against the background of a rather negative attitude towards mathematics points to further advantages of tablet-based testing (see Figure 3). Slightly more than half of the children surveyed said that difficult math tasks made them feel uncomfortable ("I fully agree" and "I tend to agree"), and nearly half of them indicated that this feeling did not occur when testing with the tablet.

Discussion
Against the background of the importance of CBM-based progress monitoring of students' performance, but also the challenges of its implementation in practice (additional time and resource-related expenditure), this article elaborated on the importance of digitally supported progress monitoring using tablets. possible discrepancies between the different test modes (here: paper-pencil-based vs. tablet-based). The aim of the present study was therefore to examine whether paper-pencil-based CBM can also be implemented as a tablet version (usability), whether differences between the test conditions become apparent (mode effects) and whether a tabletbased implementation has advantages over the paper-pencil-based version (advantages of digital media).
Basically, the applicability of tablet-based CBM in the classroom seems possible. This is supported by the findings, according to which many German children have access to a tablet at home (Educational Media Research Association Southwest, 2016). The student survey conducted here shows an even clearer picture. Around 64% of the children questioned in this study have a tablet at home and use it at least one day a week. The tablet therefore is a familiar medium for children. Accordingly, the students examined found it easy to operate the devices in the test situation.
To clarify the question of the extent to which paper-pencil-based CBM can be carried out as a tablet version, various analyses were conducted in this study. Following Hensley (2015), it must be shown that the mean values and standard deviations are comparable for both forms of test performance. The same applies to the reliabilities and correlations with external criteria. Only when these criteria are met the use of a new evaluation method can be considered equivalent (American Educational Research Association, 2014). The descriptive results determined here at the overall test level show high similarities for both test modes. No significant differences could be detected by variance analysis or by using a graphical model test to determine the measurement invariance. This is consistent with existing research findings (Poggio et al., 2005;Pommerich, 2004;Wang et al., 2007). We also found high correlations between the test modes. The correlations with the half-year mark in mathematics are also comparable. These findings taken together indicate that there are no mode effects between the test methods used here and that the results can be considered equivalent in each case. In a differentiated analysis at item level, however, in some cases quite large deviations in the students' results could be registered (up to just under 50 %). These differences can be seen to varying degrees between the areas of addition and subtraction. As an example, it was shown that items with large numbers are more likely to be incorrect when they are processed on the tablet. The same significant effects of the test mode are also reported earlier (Bennett et al., 2008;Hensley et al., 2016;Jerrim, 2016). There are two main lines of argumentation to explain these differences (Hensley, 2015): a) differences due to the technology used or b) differences due to subject factors. In terms of technology, results could be affected by the fact that there was no way to edit and change tasks retroactively within the CBM app (Vispoel, 2000). Differences due to personal factors are also conceivable here (Lottridge et al., 2011;Poggio et al., 2005). Since the processing time was limited to 15 minutes regardless of the test mode, weaker students in particular were no longer able to complete the last and most difficult tasks. This reduces the internal validity of the presented findings. Overall, however, the deviations in individual items do not have a serious influence on the overall test results.
Finally, the advantages of a tablet-based test compared to a paper and pencil test need to be clarified (Cozad & Riccomini, 2016). In general, it can be said that the motivation of the children was significantly higher when they worked on the tablet. Interestingly, several students stated that they were able to complete the tasks on the tablet faster (almost 50 %) or that the tasks were easier (almost 60 %). However, the data collected indicate that, on average, the children completed about the same number of items in both test conditions (about 22-23 items). In contrast to the results of the survey, it is shown that the children solve more items correctly on average in the paper-pencil-based test (addition: mPaper = 17.59 vs. mTablet = 16.80;subtraction: mPaper = 14.88 vs. mTablet = 13.40). However, these deviations are not statistically significant. Even if the advantages of the tablets described by the children cannot be reproduced, these perceived advantages -together with the more economical test implementation and evaluation -speak more in favor of the use of tablet-based CBM in schools. This could be because children nowadays are confronted and use a wide range of digital media (Arslan-Cansever, 2019). Particularly relevant in this context seems to be the finding that about half of those students who stated that difficult mathematics tasks made them feel uncomfortable (just over 50 %) did not had this uncomfortable feeling when they completed the tasks on the tablet. Thus, working with the tablets seems to reduce a high emotional pressure of the children when working on the items. This is particularly important in view of the limited time available for processing when using CBM (Hosp et al., 2007). This is a key factor as math anxiety have a big impact on math performance (e. g. Kesici & Bindak, 2019). Another advantage of a tablet application is the possibility to determine the time needed to solve each task automatically, so that it is not necessary to limit the processing time. However, there are special circumstances for schools in Germany. Although there has been a marked increase in the frequency of regular private computer use in recent years (in 2003, 82% of the German 15-year-olds surveyed stated that they used the computer regularly in a domestic context, in 2006 the proportion was 90% and in 2012 almost 100%; Organisation for Economic Cooperation and Development, 2015;Senkbeil & Wittwer, 2007), the use of computers in class is significantly lower. For example, only 23 % (2003) and 31 % (2006) of students reported using computers regularly in class (Senkbeil & Wittwer, 2007). Although this percentage rose to just under 70 % by 2012, Germany is still below the OECD average (Organisation for Economic Cooperation and Development, 2015). The present study also shows that German children rarely use a tablet in class (m = 3.51, sd = 0.85 on a 4-level Likert scale).
One approach to explain this is that there are not enough digital terminals available in schools to enable all students to work with them. Even though the number of computers, notebooks or tablets has increased in recent years (Hensley, 2015), in 2006 an average of 12 students still shared a PC or 174 children a notebook in the school setting (German Federal Ministry of Education and Research, 2006). A better supply of German schools with digital end devices is to be achieved through the recently published funding program of the German Federal Ministry of Education and Research (2019).

Conclusion
The present study shows that a tablet-based application can significantly reduce the perceived disadvantages of using CBM in everyday school life (no additional financial and time expenditure for teachers and less pressure solving computation tasks for students), with comparable results for students.
Moreover, the findings indicate that the processing of the CBM on the tablet has the impression of an advantage among the students. They find the items simpler and a considerable proportion of students who feel emotional stress when solving hard computational tasks, do not have this bad feeling when working on the tablet.
The corona pandemic has also highlighted the need for efficient digital infrastructures in schools and for digital educational content (Casale et al., 2020;Nobel et al., 2020).
The funding directive for the improvement of technical equipment at German schools described above is promising here (German Federal Ministry of Education and Research, 2019). This provides the technical framework for digitally supported progress monitoring. However, it is of far greater importance to make teachers at German schools aware of the advantages of using CBM in class and for accompanying interventions, thus ensuring increased demand.

Suggestions
CBM are a tool to make educational decisions based on reliable and objective data (Jung et al., 2018;. In this way the instruction or intervention is evaluated, and a modification can be initiated if necessary (e.g. Stecker et al., 2005). The positive effects of progress monitoring have been reported several times in the international literature. In Germany, the concept of data-based decision-making based on progress monitoring of student performance has only partially gained acceptance in school practice. There are several explanations for this, from which suggestions can be derived.
First, it should be noted that German teachers have different views on data relevant to school and related decisions than teachers from the USA (Blumenthal et al., submitted). Accordingly, the demand for CBM in Germany is certainly lower. In general, questions regarding the successful implementation of CBM in German schools do not yet seem to have been sufficiently clarified and should be in the focus of further research (Hebbecker, & Souvignier, 2018).
Another suggestion is that teachers in Germany should be made aware about the idea of data-based decision-making. This topic should already be given a stronger focus in university education (Wagner et al., 2017), but also to teachers already working in school practice . In the literature there is a lack of evaluated concepts for further training and coaching of teachers for a safe handling of data in the sense of a data-based decision-making approach. Initial findings from Espin et al. (2017) should be further expanded by systematic research.
Furthermore, the design and evaluation of instruments for reliable and valid progress monitoring in Germany should be a priority in research (Voß & Gebhardt, 2017).
Digital test environments offer a great advantage for supporting the application and implementation of data-based decision-making in everyday school life. They reduce the additional work involved in test execution and evaluation and allow automated interpretation of test results in terms of comprehensive student performance profiles and tailored support information (Hosp et al., 2007;Kingston & Nash, 2011). Researchers should work on possibilities to support the effective data use of teachers; digital media can play a significant role here.

Limitations
Finally, we would like to point out limitations of the results presented here. The CBM were worked on here by the students in a time of 15 minutes each. Usually, however, a time limitation to a few minutes is provided. Whether the findings are transferable even for such short processing periods remains open. It must also be indicated that this study was designed as a cross-sectional study. Whether comparable results can be achieved in a longitudinal design cannot be conclusively clarified either.
The findings reported here relate exclusively to computation tasks in addition and subtraction. A generalization of the findings to other types of assessments is not justified. Important hints for the selection and development of apps as an assessment tool give, for example, Cayton-Hodges et al. (2015).