Measuring Purpose in Life in College Students: An Assessment of Invariance Properties by College Year and Undergraduate School

Purpose in life is a key construct in the development of young adults, particularly college students. There are many instruments measuring sense of purpose in life, but few studies have examined their measurement properties among college students. The current study compares the measurement invariance properties of the Purpose in Life (PIL) scale and the Claremont Purpose Scale (CPS) across college year and undergraduate school. Using both a unidimensional and a two-dimensional model, we found that the PIL’s interpretability is limited among college students. Using a three-dimensional model, the CPS was invariant with respect to both grouping variables. The study suggests that the CPS can be used to make meaningful comparisons among college students categorized by school year and undergraduate school. The study also has some implications about the construct of purpose in life; namely, scale structures that work well statistically and theoretically among adults might not generalize to young adults.


Introduction
Purpose in life, an aim that inspires life goals and provides a sense of meaning (McKnight & Kashdan, 2009), is a fundamental human motivation that drives and directs behavior (Frankl, 2006). It is a central outcome in positive psychology, as it is associated with various positive mental and physical outcomes (e.g., Blattner et al., 2013;Kim et al., 2020).
A sense of purpose is a particularly important outcome for emerging adults. Emerging adults, people aged 18-25, are individuals in transition between the dependency of childhood and the responsibilities and commitments of adulthood (Arnett, 2000). One of the developmental tasks of emerging adults is establishing their purpose in life (Damon, 2008); as they become more capable of complex thinking (Malin et al., 2014), they are expected to establish a strong sense of what they aim to do with their lives in this stage as a part of their identity formation (Burrow & Hill, 2011). Emerging adults who can accomplish the task of developing a stable sense of purpose have a greater sense of well-being as well as other factors of positive youth development (García-Alandete et al., 2018;Kadir & Mohd, 2021), while those who did not were more prone to depression, anxiety, and stress (e.g., Hong et al., 2018). The importance of purpose in life is evident in different cultures and contexts (e.g., Qin et al., 2015;Zhang et al., 2018). It is clear, then, that a sense of purpose in life is an important part of development, particularly in early adulthood.
Emerging adults' purpose can stem from any number of sources, such as career calling (Praskova et al., 2015;Yuliawati & Ardyan, 2020), faith (Culver & Lundquist Denton, 2017;Liang & Ketcham, 2017), and relationships (Glanzer et al., 2015;Lund et al., 2019). Colleges can also be the source of purpose or the location where this self-search takes place (Glanzer et al., 2017). Among college students, a strong sense of purpose is related to positive academic outcomes (e.g., DeWitz et al., 2009;Sharma & Yukhymenko-Lescroart, 2018), engagement (Yu et al., 2016), and lower rates of mental health issues such as depression, and alcohol-related problems (Pearson et al., 2015). Many higher education institutions see themselves as instrumental in supporting their students' development, including the development of sense of purpose (e.g., Shin & Steger, 2014). Some of them even implement interventions to pursue this goal. In a recent review, Pfund et al. (2020) suggest strategies universities may utilize to support the development of their students' sense of purpose. Some examples include individual work with academic advisors, service-learning courses, and peer mentorship. Indeed, some findings suggest that college education may help students be more proactive and have agency when seeking their purpose (Sumner, 2017).
As with any other intervention, efforts designed to help students develop a sense of purpose must be evaluated using high-quality instruments that can detect student-level changes over time. Many instruments exist to measure sense of purpose, either in the general population or specifically among college students. One of the most commonly used is the Purpose in Life scale (PIL; Crumbaugh & Maholick, 1964). Although it was not specifically designed for college students, it has been widely used among this population (Molasso, 2006) and in interventions related to positive psychology (Feldman & Dreher, 2012). However, this instrument might not be appropriate for the college student age group, as some of its items refer to issues such as retirement (Bronk et al., 2018). So, Bronk et al. (2018) developed the Claremont Purpose Scale (CPS), which was designed for and validated using a sample of emerging adults. Consequently, it is expected to be a more appropriate measure of purpose among college students. The CPS has also been used to evaluate interventions designed to enhance young adults' sense of purpose (Bronk et al., 2019).
In spite of the widespread use of self-report instruments to measure sense of purpose among college students, relatively few studies explore these instruments' measurement invariance qualities. Measurement invariance is a property of a scale that applies when the psychometric relationships among the items of the scale are maintained across different groups or settings (e.g., demographic groups, modes of administration, time points, etc.; Byrne et al., 1989). If scale invariance fails to some degree, interpretations of group comparisons are problematic because different scale structures mean that scores have different meanings for different groups. Hence, in the context of assessing students' development and of comparing groups of people, it is imperative that the scale produced by the measuring instrument be invariant.
Several studies have examined the measurement invariance of the PIL scale. They established the instrument's invariance by age and gender (García-Alandete et al., 2019) and between people undergoing therapy compared to healthy controls (Marsh et al., 2003). However, these studies focused on adolescents or adults and may not be relevant to a college population. For universities who seek to track the development of their students' sense of purpose following an intervention or throughout their normal progression of studies, it is important to establish the PIL's invariance across college years. Furthermore, these studies did not address other group differences that might be relevant for a college population, such as college major. As far as we know, no studies have explored the measurement invariance of the CPS among college students, although a recent study supported the measurement invariance of the Indonesian version of the CPS between males and females using a sample of adolescents (Yuliawati, 2021).
The current study explores the measurement invariance of both the PIL and the CPS for college year and undergraduate school. The PIL was selected because of its popularity in positive psychology research, and the CPS was selected as an instrument that was designed for young adults and, therefore, is expected to have better measurement properties than other instruments. Testing the measurement invariance of both instruments will enable researchers, college administrators, and counselors to consider their appropriateness for comparing levels of purpose by year and school. This will be a first step in testing the instrument's potential for tracking students' growth over time or as a result of specific interventions.

Instruments
Purpose in Life Scale (Crumbaugh & Maholick, 1964); The PIL is one of the first and most popular instruments measuring sense of purpose. It is a 20-item scale with a seven-point Likert response format with response options varying by item. For example, one item is "I am usually…" with response options varying from "completely bored" to "exuberant, enthusiastic", and another item is "If I could choose, I would…" with response options ranging from "prefer never to have been born" to "like nine more lives just like this one". The authors present evidence of convergent and divergent validity and report split-half reliability of .81. In the current sample, the instrument's Cronbach α was .88. The content of the items is presented in Appendix A.
Claremont Purpose Scale (Bronk et al., 2018); The CPS is a five-point 12-item Likert scale designed to measure three aspects of purpose (Damon et al., 2003): meaningfulness (having a goal that is important to oneself), goal-orientation (working hard to achieve that goal), and beyond the self (the goal has a potential impact on others). Each aspect was measured by four items. Bronk et al. (2018) present convergent and divergent validity evidence and report a Cronbach α of .92 for the entire scale and reliabilities above .85 for each subscale defined as one of the three aspects. In the current study, the Cronbach α for the entire scale was .88, and the reliabilities of the three subscales were .87, .88, and .88, respectively. The content of the items is presented in Appendix B.

Participants and Procedure
The data for the current study were collected as a part of a larger study (Ludlow et al., under review). We invited 5,000 randomly selected undergraduate students at a research university in the Northeastern US with a population of approximately ten thousand undergraduates to participate. Their gender, school year, and undergraduate school were provided by the university with the approval of the university's IRB board (IRB number 19.293.01e). In this university, there are four schools for undergraduates: Arts & Sciences, Management, Education and Human Development, and Nursing. The current sample includes students from all four schools.
Eight hundred and thirteen of the 5,000 students responded to the survey (16%) and consented to participate. We retained 686 who responded to all items and had demographic data available. In order to keep the survey short, the participants were randomly divided into subsamples who took different instruments. About half of the sample took the CPS, and either the PIL or a life satisfaction scale (Diener et al., 1985) not discussed in the current study. The final sample included 174 participants who provided demographic information and took the PIL and 345 with demographic information who took the CPS. Table 1 describes each sample's demographics.

Analyses
In order to establish the measurement invariance of the instruments by school year and undergraduate school, we used confirmatory factor analysis (CFA) as our analytic framework. All analyses were conducted using R (R Core Team, 2017), and the CFA analyses were conducted using the lavaan package (Rosseel, 2012). In our analyses, we followed the guidelines suggested by Bowen and Masa (2015). Although their guidelines involve ordinal data, we treated our scales as continuous. This was for two reasons. First, we assume an underlying continuous construct (i.e., sense of purpose). Second, the sample size was too small to support an ordinal model that involves more parameters.
To conduct a CFA analysis, the data must meet two assumptions: sufficient sample size and multivariate normality (Bryant et al., 1999). In continuous CFAs, the recommended minimum sample sizes can range from 100 to 500, making the current study's sample size adequate (Kyriazos, 2018). However, both the PIL and the CPS did not meet the condition of multivariate normality (PIL: Mardia's skewness = 2138.63, Mardia's kurtosis = 9.35, p < .001; CPS: Mardia's skewness = 916.66, Mardia's kurtosis = 19.59, p < .001). This is likely because the participants tended to endorse the items and had relatively high scores. Therefore, all analyses were conducted using a robust maximum likelihood estimator.
According to Bowen and Masa (2015), establishing an instrument's invariance starts with identifying a common model that fits each group, and continues with a series of more restrictive models such that once a model does not fit the data, it is assumed that the corresponding level of invariance was not achieved. These models represent configural (the groups follow the same factor structure), metric (weak; they have similar item loadings), scalar (strong; the groups have the same item intercepts), and residual (strict) invariance (the groups have the same residual variances). An instrument is considered sufficiently invariant, i.e., the groups can be meaningfully compared, if it has at least scalar invariance. In addition, if the scales did not meet the conditions of one level of invariance, we tested for partial invariance by relaxing specific constraints (e.g., some items cross-load or some residuals correlate in one group but not the others).
We use several criteria to assess model fit: a χ 2 test comparing the models and the difference in the models' comparative fit index (ΔCFI) and root mean square error of approximation (ΔRMSEA). These measures of fit are commonly used in the literature and compensate for issues like sample size sensitivity (Putnick & Bornstein, 2016). We used a non-significant (p > .05) χ 2 test, CFI ≥ .96, and RMSEA ≤ .06 as indicating good absolute fit (Hu & Bentler, 1999), and a non-significant χ 2 test, ΔCFI ≥ -.01, and ΔRMSEA ≤ .015 as criteria of good relative model fit (Chen, 2007;Cheung & Rensvold, 2002).
Finally, in poorly fitting models, we also looked at modification indices (MIs) to identify items that substantially harmed the model's fit to try and establish partial invariance. Then, we freed selected parameters one at a time from high to low MI until a satisfactory model fit was achieved. In order to determine whether too many parameters had to be freed to have a good fit, we followed Dimitrov's (2010) recommendation of relying on the researcher's judgment or a criterion of less than 20% of the parameters freely estimated as indicative of acceptable partial invariance.

Purpose in Life (PIL)
School Year; The PIL was designed as a unidimensional instrument where all 20 items are expected to load on the same factor. Accordingly, we tested this model in each school year group separately. We found that PIL15 (preparedness for death) did not load on the purpose factor for all four groups, items 13 (I am responsible), 14 (humans should be free to make life choices), and 16 (suicidal thoughts) did not load on the purpose factor for juniors and seniors, item 8 did not load on the purpose factor among sophomores and seniors, and items 7 (Activity after retirement) and 18 (Life internally/externally determined) did not load on the purpose factor among seniors. Removing the items from subsequent analyses still resulted in a poorly fitting model. Therefore, we removed item 8 and items 13-16 from our model and used the MIs to identify parameters that adversely affected the models' fit. Then we relaxed the constraints on these parameters until achieving an acceptable fit. The final, well-fitting models had several covarying items' error terms (residuals) in each of the groups: six pairs of error terms among first-year students, two among sophomores, three among juniors. We were not able to reach a well-fitting model among seniors. Only items 2 and 5 (newness of each day) had correlated residuals in more than one group.
Even though there were multiple fit issues with the scale, we tested for configural invariance using a model without items 8, 13-16, without seniors, and allowed the residuals of the items 2 and 5 to covary. The fit of the resulting configural model was still poor (χ 2 (267) = 393.20, p < .001; CFI = .890, RMSEA = .091). To achieve an acceptable fit, we allowed five pairs of item residuals to covary among first-year students, two among sophomores, and one among juniors. Ultimately, eight parameters had to be freed to achieve an acceptable fit, which meets the criterion of less than 20% of the 267 parameters that could be freed. However, we decided that the resulting model was too different from the original PIL model and too complicated for practical purposes; that is, too many parameters had to be freed relative to the hypothesized model. In addition, it did not include students in their senior year. Therefore, we concluded that for all practical purposes the PIL is not invariant with respect to school year; that is, the structure of the instrument varies substantially between groups.
In view of these issues with the original unidimensional model, we explored an alternative two-dimensional model of the PIL. Although several such models were explored in the literature, we chose to use Morgan and Farsides' (2009) two-factor solution, where items 2, 5, 7 (activity after retirement), 10 (life lived having been worthwhile), 17 (capacity to discover meaning), 18 (life internally/externally determined), and 19 (contentment in daily tasks) load on a factor named exciting life, and items 3 (presence of clear life goals), 8 (life goal completion), and 20 (presence of goals/life purpose) load on a factor called purposeful life. The two factors in this model were allowed to covary. Note that items 1, 4, 6, 9, and 11-16 are not included in this model, echoing our findings suggesting that items 13-16 do not load on the general purpose factor. Morgan and Farsides' (2009) model was selected as it was found to be the best fitting one out of a set of ten different models by Schulenberg and Melton (2010). We tested this model for each group separately. We found a good fit among sophomores. Among first-year students, we allowed item 17 to load on the purposeful life scale and items 2 and 5 to covary to achieve an acceptable fit. Among juniors, we allowed the residuals of items 2 and 5 and items 3 and 19 to covary to achieve an acceptable fit. Among seniors, we allowed item 17 to load on the purposeful life scale. Given the minimal fit issues, we continued testing for group invariance using the original two-factor model but allowing item 17 to load on the purposeful life scale.

Undergraduate School;
We conducted a separate CFA for each undergraduate school group. Since there were relatively few Nursing (n = 5) and Education and Human Development students (n = 17), they were omitted from the analyses, leaving Arts & Sciences and Management students. We found that item 15 again did not load on the purpose factor for any of the groups; items 6 (Wishing more lives), 13, and 14 did not load on the purpose factor in one of the groups. We removed item 15 from subsequent analyses but still had to allow the residuals of several items to covary to achieve acceptable fit: 15 pairs of item residuals had to covary among Management students, and five pairs of items had to covary among Arts & Sciences students, with the residuals of items 17 and 20 allowed to covary in both groups. In spite of the multiple fit issues, we tested measurement invariance with a model including all items except for item 15 while allowing the residuals of items 17 and 20 to covary.
The fit of the configural model was poor (χ 2 (302) = 499.50, p < .001; CFI = .837, RMSEA = .085). We tested for partial invariance by freeing parameters with high MI until the model's fit was acceptable. We had to allow six pairs of items to covary in the Arts & Sciences group, and another eight pairs in the Management group; these 14 freed parameters represent less than 20% of the possible parameters. Given the multiple fit issues, we concluded that the PIL is not invariant with respect to undergraduate school; the structure of the instrument varies substantially between groups.
We also attempted to test the invariance of Morgan and Farsides' (2009) two-factor model regarding undergraduate school using only Management and Arts & Sciences students. However, testing this model on one of the groups (Management) resulted in an unidentifiable model, consequently, we did not continue with the invariance tests.

Claremont Purpose Scale (CPS) School
Year; The basic model for the CPS involved three factors-previously referred to as "aspects" (meaningfulness, goal-orientation, and beyond the self), with four items loading on each factor. However, Bronk et al. (2018) generally report a single score and treat their instrument as measuring a single "purpose" construct. Therefore, we tested the CPS' invariance using both a three-dimensional model with covarying factors and a unidimensional model.
We tested the three-dimensional model for each school year group separately. The models fit well for first-year and third-year students, but we had to allow some items' residuals to covary in order to achieve an acceptable fit among second-and fourth-year students. For second-year students, these pairs were meaningfulness 2 ("How well do you understand what gives your life meaning?") and beyond the self 1 ("How often do you hope to leave the world better than you found it?"), and meaningfulness 3 ("How confident are you that you have discovered a satisfying purpose for your life?") and beyond the self 3 ("How important is it for you to make the world a better place in some way?"). For seniors, these pairs were meaningfulness 1 ("How clear is your sense of purpose in life?") and goal-orientation 2 ("How much effort are you putting into making your goals a reality?"), and goal-orientation 2 and beyond the self 3. Since there were relatively few fit issues and none of them appeared in more than one group, we used the original model without covarying residuals to test for configural invariance.
We repeated the analysis for a unidimensional model for each of the school year groups separately. We found a very poor fit in all of the groups and had to allow several items to covary. Most of these covarying items were from the same subscale (e.g., goal-orientation 1 and 2). We decided that the appropriate model was a three-dimensional one, and did not continue testing for configural invariance.
Undergraduate School; Three-dimensional CFA analyses were conducted for each undergraduate school separately. We found an acceptable fit for Arts & Sciences and Management students, but not for the other groups. Trying to release constraints for Education and Human Development and Nursing students resulted in an unidentifiable model. This may be due to the small sample size in each group (n = 31; n = 13, respectively). So, we decided to remove these groups and continue with the analysis. Since the original model fit the data well in the other two groups, we tested the instrument's invariance using the original three-dimensional model, without allowing residuals to covary. Testing the configural model resulted in an acceptable model according to the CFI and RMSEA (χ 2 (102) = 170.52, p < .001; CFI = .978, RMSEA = .054). We continued testing for more restrictive forms of invariance, and found a good fit of the metric (Δχ 2 (9) = 3.74, p = .928; ΔCFI = .003, ΔRMSEA = -.005), scalar (Δχ 2 (9) = 26.23, p = .002; ΔCFI = -.006, ΔRMSEA = .004), and strict models (Δχ 2 (12) = 8.56, p = .740; ΔCFI = .007, ΔRMSEA = -.009). We concluded that the hypothesis that the CPS is strictly invariant with respect to undergraduate school is reasonable, at least for Arts & Sciences and Management students.
We repeated these analyses using the unidimensional model for each school. Similar to our findings with respect to school year, the fit was poor in all of the undergraduate groups. The MIs were largest for items from the same subscale (e.g., goal-orientation 1 and 2). We concluded that the appropriate model was a three-dimensional one and discontinued our invariance tests.

Discussion
Properly measuring college students' sense of purpose is an essential first step in tracking its development and growth, both for college administrators and therapists and counselors working with this population. Without high-quality instruments, it is impossible to make meaningful interpretations of a person's score or change in their score over time. Therefore, ensuring that instruments measuring purpose have similar psychometric properties across diverse groups, however they are defined, is crucial. The current study explored the measurement invariance of two instruments measuring purpose among college students -the PIL (Crumbaugh & Maholick, 1964) and the CPS (Bronk et al., 2018).
The instruments displayed different degrees of invariance. For the PIL, we could not establish configural invariance with respect to school year and undergraduate school. We also struggled with fitting a model in each group individually. This means that in the current sample and using the original PIL model, the PIL did not follow a unidimensional structure, and several of its items were not measuring a common sense of purpose across groups. These findings are consistent with other studies that found (a) some of these same items were problematic in other samples of college students (Morgan & Farsides, 2009) and (b) minimal support for a single-factor structure within the instrument (Schulenberg & Melton, 2010). Using a two-factor model that excluded several of the more problematic items, the PIL had partial metric invariance with respect to the school year, which is not considered sufficient for comparing groups. It was impossible to determine its invariance with respect to school. That is, it seems like the PIL may not be suitable for college students, at least based on the models investigated in the current study.
The CPS was more successful in terms of statistical fit. When using a three-dimensional model, the CPS was strictly invariant both with respect to school year and with respect to undergraduate school (using only Arts & Sciences and Management students). This means that using this instrument, researchers can meaningfully interpret group differences, as scores represent similar levels of the construct across groups. This result is similar to Yuliawati's (2021) findings even though she used different comparison groups, ages, and language; more work is needed to establish the measurement invariance of the CPS in other contexts if it is to be used regularly. We found no support for a unidimensional structure when considering year or undergraduate school. This means that researchers using the CPS should consider using and reporting the subscale scores rather than a single, composite purpose score.
In summary, this study's results support the use of the CPS to assess purpose in life among college students over the PIL. This conclusion is reasonable given how the CPS was tailored for this population and specifically addresses issues raised in previous studies on the PIL. The differences between the scales raise an important question about the measurement of purpose. Since the PIL seems to work well for some populations but not for college students, this suggests that those two populations understand what purpose means differently; for college students, for example, a sense of purpose in life typically does not involve thoughts about death or retirement. Indeed, some findings suggest that the types of purpose and the effects of purpose on well-being vary by age (Ardelt & Ferrari, 2019;Battersby & Phillips, 2016). However, studies comparing how different age groups understand purpose in life and how this relates to measuring the construct are rare.
Beyond its implications on measurement, this study has some implications for practitioners, particularly college administrators interested in supporting their students' social and personal development. There are several strategies on how to help students develop their sense of purpose (Pfund et al., 2020), and some formal interventions have been created and evaluated (e.g., Arnoux-Nicolas et al., 2018;Shin, 2013). However, to truly learn if an intervention works, one must use the proper tools to evaluate it. Any instrument used for this purpose must first carry the same meaning across the different groups it is used on. The results of this study suggest that this tool may vary by population, and in the case of comparing college students by school year and school, the CPS may be preferable.

Conclusions
The current study investigated the measurement invariance of instruments measuring purpose with respect to college year and school. We found that the CPS had stronger psychometric properties than the PIL for measuring sense of purpose in this population. College administrators, therapists, and others interested in enhancing students' sense of purpose should consider the measurement properties of the instruments they use to assess students' development, and take them into account when planning group comparisons.

Recommendations
This study is the first to explore the measurement invariance of these instruments measuring sense of purpose among college students. As such, its findings suggest that the CPS is appropriate for group comparisons among college students, but that the PIL may not be appropriate for that goal. The findings could also be seen as initial evidence that the CPS could be used to track students' development of sense of purpose during college, as the instrument is invariant with respect to school year. Therefore, practitioners seeking to evaluate their interventions over time should prefer the CPS. However, studies assessing the instruments' longitudinal invariance (whether scale structure remains constant over time) are needed to confirm this suggestion.
These results could mean that the construct "sense of purpose" is interpreted differently by different people. For example, college students in the current sample did not consider preparedness for death as a part of their sense of purpose in life, while others might (e.g., adult women; Chamberlain & Zika, 1988). Future researchers should consult with members of the target population and see how they understand the instruments they use, and perhaps revise existing definitions of the construct accordingly.
Future studies could also explore the measurement invariance of other instruments used to measure purpose, or with respect to other relevant groupings such as students who participate in interventions designed to enhance a sense of purpose vs. other students. Doing so will be valuable for researchers and practitioners interested in enhancing positive psychological constructs such as purpose, as it will allow them to make meaningful comparisons among groups. This study is one of the first steps in this direction.

Limitations
In spite of the study's importance, it has several limitations. First, the relatively small sample size might have affected the models' fit. Furthermore, all of the participants were self-selected students from a specific university, so generalizations to other contexts or populations might be limited. Future studies should replicate these analyses using more participants who represent diverse institutions and subgroups. Finally, although we determined that the PIL was not group invariant using a unidimensional model and a two-dimensional model, testing for alternative factor structures (e.g., Jonsén et al., 2010;Su et al., 2006) was beyond the scope of the current study, and could be explored in the future.