High School Principals’ Ability to Estimate Work Time

Time management for educational leaders has remained highly relevant to scholars, policymakers and practitioners. We analyzed survey responses from 98 public high school principals to examine the congruency between average total hours they worked per week against the sum total of the average hours worked per week in each of five distinct categories of leadership tasks. The observed congruence was 0.32, while Cohen’s kappa coefficient was 0.10. Female principals tended to underreport, and male principals tended to overreport, total work time. Principals with doctorate degrees exhibited higher congruence than those without, and overreporting was inversely related to highest degree. Principals in charge of large teaching staffs were more likely than their counterparts to be congruent and less likely to overreport total work time. Self-report appears to be an inaccurate method to measure time use among high school principals. If time use is a key component of the quality of principal leadership, more detailed and robust techniques for collecting time use data should be utilized in future studies.


Introduction
The Dimensions of School Leadership Study (DSLS) examined leadership practices and characteristics of public high school principals working under a variety of geographic, demographic, and social contexts in the state of Missouri (Ongaga et al., 2020). There has been a growing body of international research examining the nature and purpose of school principalship and its relationship to school improvement (Andrew et al., 1986;Andrew & Soder, 1987;Day & Sammons, 2014;Gurr, 2017;Gurr et al., 2007;Gurr & Day, 2014;Leithwood et al., 2004;Mulford et al., 2008). The strong link between principal leadership and student achievement has caused educational leadership to be recognized as one of the most influential school-level factors on student achievement (Brockmeier et al., 2013;Fullan, 2001;Hallinger & Heck, 1998;Leithwood et al., 2004;Randall & Jones, 1988;Sergiovanni, 2001;Valentine & Prater, 2011). According to , effective principals are those who focus on improving student achievement and are "hands-on leaders, engaged with the curriculum and instruction issues, unafraid to work directly with the teachers, and often present in classrooms" (p. 66). However, studies on how principals spend their time (e.g., Grissom et al., 2010; have shown that they rarely measure up to this ideal. Multiple researchers have found that principals spend minimal amounts of time on instructional leadership activities (Grissom et al., 2015;May & Supovitz, 2011;May et al., 2012;Murphy, 1990;Murphy et al., 2007). In the review of the literature, we present a variety of methods utilized in educational leadership research to quantify principals' time use, each with its own pros and cons.

Self-Report Time Use Surveys
Self-report surveys have been widely used to solicit information about use of time in different areas of research, largely due to their low cost and low burden for the respondents (Fink, 2017). However, self-report surveys have been criticized for their vulnerability to recall bias, particularly those in which respondents are asked to recall activities or experiences which are not recent (Camburn et al., 2010) or cover a long (Tourangeau et al., 2000) or indefinite (e.g., 'on average') time period. Lavigne et al. (2016) analyzed self-report data from 5950 principals collected during the 2011/12 school year through the U.S. Department of Education's National Center for Education Statistics Schools and Staffing Survey (2014). Principals were asked to estimate total hours spent per week for their job and the percent of time spent on each of five job-related activity groupings. The researchers converted the five percentages provided into numbers of hours based on the principal's estimated total weekly work hours. While a sample size of this magnitude may mitigate error associated with recall bias, the researchers provide a caveat that findings based on this method "should be interpreted with caution. It is possible that principals either underestimated or overestimated time spent on the job or on certain tasks" (p. 8).
Another methodological issue that often arises with time use surveys is related to cyclical variation over the school year (Camburn et al., 2010). They argued that the way a principal spends their time in the early part of the school year is likely to be quite different from the middle or end of the school year. Hence, a study design that solicits time use information about a single day or week (say, the prior day or week) in order to reduce recall bias is still subject to the effect of cyclical variation. This issue applies as well to a study design that solicits time use information 'on average', because a principal's recall is likely to be influenced by their recent experiences. Camburn et al. (2010) study provides strong evidence against relying on reports or observations based on a single time-point or indefinite time period. They conducted a study in which principals were asked to complete daily logs for seven 5-day periods over a 2-year timeframe. This design permitted them to decompose the variance of their time use measures for six leadership functions using a hierarchical linear model (HLM). The estimated proportions of variance between principals were much smaller (.001 to .283) than the proportions of variance between days (.717 to .999). The authors reference Weick (1996) and argue that it is reasonable to "expect principals' emphasis on a particular leadership function to vary substantially from day to day. Indeed, the vast majority of the variation in principal's engagement in the six leadership functions was found in day-to-day fluctuation" (p. 721). They further observe: Variance estimates from the HLM analyses…clearly indicate that either measuring a single day or asking principals to provide a summary of their practice across a long span of time, such as an entire school year, may misrepresent what principals do because important fluctuations in their work would be obscured. (pp. 730-731) Some researchers have examined use of time without attempting to estimate hours at all (Augustine et al., 2009;Ghamrawi & Al-Jammal, 2013). Ghamrawi and Al-Jammal asked 60 private-school principals from different regions of Lebanon to rate leadership tasks using a 4-point Likert scale with values 1=rarely/never, 2=occasionally, 3=often, and 4=always. Augustine et al. asked 598 principals in 17 school districts in various regions of the U.S. to rate time spent on nine instructional leadership tasks over the past school year using a simple dichotomous scale of 0=no time or some time and 1=a great deal of time.
Ghamrawi and Al-Jammal (2013) point out a limitation common to all self-report data, regardless of method, stating, "it would be more valid to request the admin team surrounding the principal to complete surveys as well" (p. 64). Grissom and Loeb (2011), who examined self-report effectiveness ratings from 279 principals on leadership skills, concurred: While principals' self-ratings reflect an informational advantage in the sense that principals experience themselves performing the tasks, principals are not unbiased observers and may not provide objective assessments. One means of evaluating the validity of the self-assessments is to check them against the ratings of another observer. (p. 1111) To this end, the researchers administered the same survey instrument to assistant principals in the same school district. They observed that effectiveness ratings from the assistant principals were, on average, lower and more variable than the principals' self-ratings.

Structured Observation
One of the research methods that is known to produce accurate data is structured observation, which is real-time recordings of time spent throughout the workday on individual tasks (Glantz et al., 2019;Bentley et al., 1994;Martinko & Gardner, 1990). Under this method, a trained observer "shadows" the principal during their workday and records, at set time intervals (e.g., 5 minutes), the activity occupying the principal at the moment. The log sheets can either be freeform, in which the observer jots down a brief description of the activity, or grid-based, in which the researchers have devised a priori a list of potential activities such that the observer simply checks a relevant box. When free-form log sheets are used, researchers are required to tediously categorize the activity descriptions for all of the participants in order to analyze the data. Even with a grid-based log sheet, the structured observation method can be expensive to implement for a large sample, due to costs related to training and incentivizing the observers. Another disadvantage of structured observations is that a principal might either alter their behavior due to the presence of an observer or might be uncomfortable with the arrangement and thus be reluctant to participate, particularly if the design calls for multiple days of observation. Camburn and Barnes (2004) conducted a mixed-methods study which included a language arts log with over 150 predefined items measuring instruction. The log was filled out at the end of language arts lessons by the observed teachers themselves and two trained observers. They found that agreement between observers was higher than agreement between observers and teachers. One important interpretation of this finding was that teachers' contextual knowledge, experience, intent, and long-range goals are part of the interactive process, and observers sometimes lacked crucial contextual information that teachers possessed.
In their study to validate structured observations, Mann et al. (1991) made continuous recordings of mother-infant pairs for one hour and used trained observers to record behaviors on log-sheets using different time intervals. They found that both the length of the time interval (i.e., 5 to 60 seconds) and the real-time duration of the behavior affected the mean percent error in the behavior durations estimated from the simulated structured observation data. As length of the time interval increased, mean percent error increased, and as real-time duration of a behavior increased, mean percent error decreased.  used structured observations with grid-based log sheets to record 43 pre-defined tasks on a single day at five-minute intervals. They postulated that observations should take place on multiple days and at multiple timepoints throughout the school year to reduce error due to day-to-day and cyclical variation.

Experience-Sampling
Under experience-sampling (ESM), a principal is prompted electronically (e.g., beeped) at either fixed (e.g., hourly) or random time-points to record their current activity using a self-administered log application listing pre-defined tasks. This method has been shown to have good short-and long-term reliability for assessing frequency and patterning of daily activity and social interaction, among other things (Csikszentmihalyi & Larson, 1987). In the school leadership literature, Spillane et al. (2007) used ESM to estimate principals' work-time in various categories and have demonstrated good validity in comparison to observers' free-form recordings of activities. However, they observed drop-off in compliance during the six consecutive days of recording. Hormuth (1986) reviewed the use of ESM in the personality psychology literature. A post-investigation questionnaire shed some light on non-compliance. Two of the most frequently given reasons for not answering a prompt included: It is frequently not possible to fill out the questionnaire immediately (65% yes) and Filling out the questionnaire disrupts my daily routine (22% yes). While using the ESM method would not seem to be burdensome time-wise for the principal, it may be viewed as a distraction or inconvenience, leading to low compliance rates.  mention this possibility among comments regarding the ESM method, stating: "The method, however, still suffers from the potential biases inherent in selfreporting. An additional drawback to ESM is that the surveys take time to complete and are thus necessarily limited in their scope so as not to overly disrupt the principal's workday" (p. 2). Additionally, depending on the complexity of the ESM log application, a training session for the participants may add significantly to time and expenses. However, training is necessary to ensure the integrity of data.

Daily Logs
In recent years, educational leadership studies have often utilized a closed-ended time allocation diary, or daily log (e.g., Augustine et al., 2009;Correnti, 2007;Sebastian et al., 2018). Such logs are completed only once per day, and are sometimes referred to as end-of-day logs. Camburn et al. (2010) tested the use of a web-based, closed-ended daily log with 48 principals to estimate time use for nine domains of responsibility. The study design called for the log to be completed at the end of the workday for five consecutive school days at each of seven time-points over a 2-year period. Camburn et al. found good congruence, overall, between the category-specific proportion of time estimated from hierarchical linear modeling (HLM) of the daily log and ESM data (separately). The largest difference in estimated proportion of time was for the category student affairs, whose estimate from the daily log data was 3.45% higher than that from the ESM data. The level 2 variance component of an HLM model estimates between-subject variance. The authors found fairly large discrepancies in between-subject variance estimates from the daily log data and ESM data for the categories building operations (daily log variance smaller) and finances (daily log variance larger). They observed that: …The differences may reflect limitations in how well the sample of observations obtained for the experiencesampling instrument on a given day represent the full range of activities that occurred on that day. …Whereas the daily log instrument is a retrospective recall of all activities for a given day, the experience-sampling instrument captures a random sample of activities (p. 722).
Under the assumption that the ESM method is a benchmark, the authors concluded that principals underreported time spent on building operations and finances. They derived four potential explanations for underreporting: 1) activities were brief and comprised a minor fraction of all activities over the hour (e.g., signing paperwork); 2) activities occurred in a noncontinuous manner (e.g., attempting to write a memo but being frequently disrupted); 3) time bias -activities during the middle of the hour were reported less accurately; and 4) activities were overshadowed by an urgent or dramatic event.

Stylized Questions
Stylized questions such as those used in the DSLS raise concerns about bias and recall error. In a workshop report published by the National Research Council (2000), the participants concluded that there are several reasons why stylized questions can be associated with a significant amount of error. First, activities that are considered socially preferable may be overreported. For example, in the DSLS survey instrument, spending time on External Relations (i.e., liaisons with community organizations) may be perceived as a socially desirable activity. Second, it may be difficult for a respondent to conceptualize the timeframe in question, e.g., an "average" workweek. Third, some activities which are being monitored may occur simultaneously. Last, the researcher must anticipate the activities which may occur and define them a priori in the data collection instrument. Citing Goldring et al. (2008) and Scott et al. (1990),  note that "recent advances in self-report data collection methods, such as end-of-the-day logs and experience sampling methods (ESM), have reduced some of these potential biases" (p. 492).

Research Questions
The DSLS includes a section in which principals are asked to estimate the average number of hours per week spent in total and on tasks that fall into five leadership categories. The inclusion of these items affords us the opportunity to assess the validity of self-report time use by examining the congruence of responses. This study is guided by the following research questions: 1. How strong is the congruence between a principal's estimate of the average number of hours worked per week in total and the sum of the average number of hours worked per week on each of five distinct categories of leadership activities? 2. Does the extent of congruence between these two measures vary by characteristics of the principals or their schools?

Survey Design and Sample Selection
Data for this study primarily comes from a survey that was distributed to non-chartered public high school principals in the state of Missouri. The DSLS form and protocol were reviewed and approved by Missouri State University's Institutional Review Board (IRB). The names, school addresses, and email addresses of all non-chartered public high school principals in the state of Missouri were obtained from a database maintained by the Missouri State Department of Elementary and Secondary Education (DESE) in July 2019. Two modes of completing the survey were utilized simultaneously, namely: a link to an online version of the survey designed using Qualtrics (https://www.qualtrics.com) and a paper version of the survey. As per the instructions, each principal was guided to complete either the online or paper version of the survey by a three-week mailing deadline.
Surveys were mailed to 503 high school principals. Thirty-one surveys were undeliverable by the United States Postal Service. Twenty-three email replies were received stating that the targeted principal had either retired or changed jobs. Twenty-six principals sent out-of-office automatic email responses indicating that they were unavailable until after the survey deadline. One hundred twenty-two surveys were returned (65 paper and 57 online). Based on the number of principals who received the survey before the deadline, the response rate was 29% (122 of 423). Twenty-four of the returned surveys (20%) were dropped from analysis because they were insufficiently complete, leaving a final sample of 98 (see Table 1). The completed paper survey data was entered into an Excel worksheet and the resulting records were combined with those derived from the online responses. Unique identifiers were added to the records in the combined file. The Excel worksheet was imported into SPSS version 24.0 for data cleaning and analysis.
A recent analysis was conducted to examine the representativeness of the 98 schools led by the principals in the final sample relative to all non-chartered public high schools in Missouri. The results confirm that the final sample is representative of all non-chartered public high schools in Missouri in 2018. This analysis is available upon request from the corresponding author.

Leadership Activity Categories
Principals were asked to estimate the average total number of hours per week they spend working at school (Total Hours Worked or THW) and on each of five categories of leadership activities (Grissom & Loeb, 2009;Spillane et al., 2007). We labeled the categories/domains as follows: (1) Table 2).  and Sebastian et al. (2018) estimated (using different methods), from real-time recordings of activities of the same sample of principals, that between 7.7% and 18.8% of work time in the daily schedule was not utilized for "work" duties. We contend that there is no firm basis on which to dispute that these "transitional" activities should not be randomly distributed across our five categories and that, therefore, any "error" attributable to unrecorded time spent on those activities should be equivalent across the five categories.

Horng et al. (2010) Camburn et al. (2010) and Sebastian et al. (2018) Study design
Self-reported average hours per week estimates using proscribed time intervals and task categories.
Not a time-focused study. Items rated for effectiveness. EFA used to form the dimensions.
Structured observations in five-minute intervals using proscribed task categories on a single day from 7 am until 6 pm.
Web-based self-administered daily log with hourly prompts for proscribed task categories. Design called for seven 5-day logs over a two-year period.

Computation of Time Use Measurements
In the survey instrument, response choices for time use variables were given as intervals of hours. Interval midpoints were substituted to ease comparisons and to allow items to be added together. The interval '<=10' was assumed to mean '0-10'. The open intervals '>40' and '>60' were assumed to have a width equivalent to the closed intervals, i.e., 10 (Hanneman, et al., 2012). The calculated midpoints were rounded down to the nearest integer to ease interpretation. For Total Hours Worked, the initial response choices and the final substituted midpoint values are as follows: These latter substituted values were then utilized to compute a new time use measurement, the sum of time spent on each of the five categories of leadership activities (Sum of Leadership Activities or SLA).

Measure of Congruency
It is possible that a responding principal could either greatly overestimate their average total hours worked per week or greatly overestimate their average hours worked per week for one or more category of leadership activities (or even both). We define 'underreport' as an instance when the reported total hours worked per week (THW) is less than the summated total hours worked (SLA). We then devised a simple three-point scale to represent the extent of congruence between reported (THW) and computed (SLA) total work time with the following definition: '1' represents cases for which there is potential underreporting of average total hours worked (i.e., THW < SLA); '2' represents cases with congruence (i.e., THW = SLA); and '3' represents cases for which there is potential overreporting of average total hours worked (i.e., THW > SLA).

Data Analysis
In the analysis, we utilized Cohen's kappa to assess agreement between the reported average total hours worked per week (THW) against the summated average hours worked per week in each of the five leadership activity categories (SLA). Cohen's kappa coefficient is a measure of agreement which is valid when two variables are measured using the same response scale and that scale takes on a limited number of distinct values (Chen, 2019).
We also used Somers' d (Somers, 1962), a nonparametric measure of association, to assess the association of the threepoint measure of congruency (regarded as an ordinal dependent variable) and classes of various personal and school characteristics (regarded as ordinal independent variables). It is worth noting that any dichotomous variable can be treated like an ordinal variable (Agresti, 2010); consequently, it is valid to use Somers' d to assess a relationship between an ordinal and a dichotomous variable.

Agreement between Total Hours Worked and Sum of Leadership Activities
The bivariate relationship of the two variables is depicted using a pyramid chart (see Figure 1). The distribution of total work time reported by the principals, Total Hours Worked, is symmetric and narrow. The modal response was 51-60 hours/week (midpoint of 55) and the distribution of summated total work time, Sum of Leadership Activities, is nearly symmetric and has the same modal response, but is much wider. Basically, it simulates a normal distribution, as anticipated by application of the Central Limit Theorem. Total Hours Worked was cross-tabulated by Sum of Leadership Activities to examine congruence (see Table 3). Reported total work time and computed total work time were congruent for 32% (30/95) of the principals. Applying the scale developed by Landis and Koch (1977) for Cohen's kappa coefficient, this simple observed statistic (0.32) indicates 'fair' agreement. Cohen's kappa coefficient for this comparison was much smaller at 0.10, indicating 'slight' agreement, but was statistically significant (p = 0.05).

Personal and School Characteristics Associated with Congruence
The congruence between reported (THW) and computed (SLA) total work time was further explored by crosstabulating the constructed three-point measure of congruency against key personal and school characteristics of the principals and assessing those relationships using the nonparametric Somers' d measure of association. Two personal characteristics demonstrated significant associations with the congruency variable. Female principals were much more likely (56%) than male principals (36%) to 'underreport' total work time ; p = 0.050), and much less likely (16%) than male principals (31%) to 'overreport' total work time. The relationship between the congruency variable and principals' highest degree was not uniform in direction. Principals with a specialist degree were much more likely (52%) than those with either a master's degree (29%) or a doctorate (29%) to 'underreport' total work time. On the other end of the congruency scale, an inverse trend was observed for 'overreporting' total work time by highest degree holders, with 38% of master's degree-holders, 26% of specialist degree-holders, and 21% of doctoral degree-holders 'overreporting'. Because the bivariate relationship was not unidirectional, the Somers' d test statistic is nonsignificant (Somers' d = -0.049; p = 0.588). However, the Pearson chi-square test indicates a moderately significant bivariate association (chi-square = 8.426; 4 df; p = 0.077).
One school characteristic, size of teaching staff, was found to be moderately significantly associated with the congruency variable ; p = 0.082). Principals from schools with a large teaching staff (>50) were more likely than those from schools with a small staff to be 'congruent' (43% vs. 27%, respectively) and much less likely to 'overreport' (11% vs. 34%, respectively).

Discussion
The primary goal of this study was to assess the validity of self-report time use by principals on leadership tasks. Ideally, a time use study of principals would involve real-time recording of time spent on each category of leadership tasks for a specified number of days at multiple time points throughout the school year using either structured observations (Bentley et al., 1994) or experience sampling (Camburn et al., 2010). However, these methods can be burdensome for the principal and may be a deterrent to participation (Hormuth, 1986;Spillane et al., 2007;). In this study, principals were asked to estimate the average number of hours worked per week in total and in each of five leadership categories. While acknowledging the recall error and recall bias inherent in such estimates, we attempted to get a sense of the magnitude of these limitations.
As anticipated, the constructed total work time measurement, SLA, has a larger number of classes (i.e., 9) as compared to the recalled total work time measurement, THW (i.e., 3). Hence, the Cohen's kappa coefficient is based on a nonsymmetrical table with three to nine classes. Chen (2019) conducted a simulation exercise to explore factors influencing the value of kappa. She concluded that, in general, a kappa based on less than six classes should be interpreted cautiously, while kappa's 'resiliency' is maximized when it is based on between six and twelve classes. In this study, with a kappa of 0.10, we deduce that the congruence between the recalled and constructed work times is only a slight one (Landis & Koch, 1977).
While it is not possible to discern direction (i.e., over or under) from this cross-sectional study, we were able to explore whether congruency was associated with some key personal and school characteristics. Defining potential 'underreport' as reported total work time being less than total work time computed by summating times reported for each of the five categories of leadership activities, female principals were much likelier to 'underreport', while male principals were much likelier to 'overreport', total work time. Perhaps male principals are responding to perceived societal expectations that professional men should work extra-long hours, while professional women need to attend to the needs of their families at the end of the workday.
It is plausible to argue that the demands, accountability measures, and espoused roles of the principals as instructional leaders present some tensions in practice, including use of time (Hallinger et al., 2020). A principal's time spent on a task may not reflect the importance they place on that activity since their responsibilities are numerous and overlapping. Tharp-Taylor et al. (2009) at RAND Corporation surveyed principals in the Pittsburgh Public Schools, asking the amount of time spent on 10 leadership activity categories during a 'typical' week, plus if they felt that the amount of time was appropriate or should be either more or less. Four of the five categories in which the largest percentage of principals reported spending more than 15 hours in a typical week (student discipline; compliance with Title I or special education requirements; other management issues; communicating with parents and community) were the same as the five categories in which the largest percentage of principals reported that they should be spending less time.
Overreporting was inversely related to the principal's highest completed degree, and principals with a doctorate were the most likely to have congruence between the total time measures. Perhaps the additional years of formal training results in better time management skills and attention to schedule planning. Similarly, principals from large schools were more likely to have congruence and much less likely to 'overreport'. Principals from large schools may be more attuned to their time allocation due to juggling larger staffs and student bodies and more curricular and non-curricular activities.
The findings of this study reflect the comment by Horng et al. (2010, p. 492) that self-report surveys allow for large samples, but often sacrifice depth and, perhaps, accuracy. Studies utilizing self-report surveys (e.g., Lavigne et al., 2016;Ghamrawi & Al-Jammal, 2013) are likely to be susceptible to self-reporting and memory biases.

Conclusion
Self-report surveys appear to be an inaccurate method to measure the use of time. Additionally, the findings suggest that under-or overreporting may not be random. Studies of principals and principalship in which use of time is a key component of the study objectives should include more detailed, albeit more time-consuming and possibly costly, techniques for collecting time use data in order to increase accuracy and ease interpretation of the findings. Further studies are needed to examine principals' work patterns more deeply in order to understand if principals allocate time differently based on different expectations and pressures.

Recommendations
Principals were asked to estimate the number of hours per week spent on a list of leadership activities, which were clustered into five distinct categories. While changes in the principals' roles have brought instruction more to the center of their work, principals need district-level support that will release more time for them to be effective instructional leaders. More research is needed with larger sample sizes that focus on how principals spend their time in different policy and social contexts, including large vs. small schools, urban vs. rural schools, schools with homogenous vs. diverse student populations, and high vs. low poverty schools. More robust data collection strategies need to be utilized to overcome inherent potential biases.

Limitations
While data from the DSLS provide a valuable opportunity to examine principals' ability to estimate work time, there are several limitations. First, we are limited by the fact that only one data collection method, surveys, was used. This is likely to sacrifice depth and, perhaps, accuracy. Second, our sample size is too limited to generalize the findings to most contexts. Finally, principals' use of time in instructional leadership, specifically, needs to be measured against what states and policy makers consider to be of priority for school effectiveness. Qualitative data on principals' voices regarding what makes school leaders effective will embellish such studies.

APPENDIX
The following is a reproduction of the time use section of the Dimensions of School Leadership Survey instrument.

Principal's Work and Time Use
In this section, we are interested in how you spend and allocate your time to various leadership categories, including curriculum and instructional leadership, school management, organization management, internal and external school relations. Please estimate the time you spend on the following categories every week.
14. On average, how many total hours per week do you work at school?