Passing The College Stress Test

Building a Live College Financial Health Dashboard

Published in

The Startup

13 min readNov 10, 2020

One of the major effects of the coronavirus has been to accelerate trends that were already in motion. Retail, an industry already in peril, has seen some of the sharpest declines in absolute sales since the start of Q2 2020 (though there has been a bit of a rebound since then). Similarly, we’ve watched as schools and universities lean on video instruction and other techniques as they (at least try to) reopen amidst the pandemic while minimizing risk to students. Some have been more successful at this than others.

Because things aren’t dire enough out there, and because my home life has been generally too serene, I dug into The College Stress Test (TCST), a book detailing the challenges facing many higher education institutions in the coming years. The authors, Robert Zemsky, Susan Shaman, and Susan Campbell Baldridge, lay out a series of metrics to measure the current and future health of a given institution.

I describe these metrics more fully below. In brief, TCST’s metrics focus on an institution’s financial and operational health. Finances are a combination of average net price (i.e. whether an institution is discounting its product more over time) and operating support (state appropriations for public schools, endowment assets and total expenses for private schools); operating metrics are defined by the number of new first-time degree seeking undergraduate students and first-year retention.

For people interested in both education and data, TCST is a good book, not least because it includes a how-to manual for enterprising readers to perform their own “stress test” given a spreadsheet and an internet connection to grab the data. Being an enterprising reader myself, I wanted to see if I could build something to track institutional health based on the parameters outlined in TCST and available through IPEDS (the Integrated Postsecondary Education Data System, essentially a clearinghouse for standardized data about higher education in the US).

It’s important to note at the outset that TCST does not establish a predictive model to forecast closures; the authors argue that there “[t]here simply were not enough closed institutions, and those few that have closed were almost exclusively from the bottom of the market.” (4) The upshot is thus that things aren’t as dire for colleges and universities as they might be. Looking at the data, it’s clear that, for now, higher education is a winner’s market. Institutions with the lowest scores tend to be more famous, bigger, and more selective, while those with more uncertain futures are smaller, less known, and less successful at retaining and graduating their students. But individual data points can be just as interesting as trend reports. This post is about looking at those individual institutions and diving into the data.

The idea behind this post is to integrate an IPEDS API with the criteria in TCST and wrap everything up in an app that runs on the web. I’ll be using R and Shiny to build this app. This post will go through the architecture and use of that app. I’ll begin, though, by briefly discussing some findings from the data, and from TCST itself. I’ve posted an R notebook (.Rmd) as well as the source code to the Shiny app on GitHub. You can see the app in action here.

Education 231: Higher Education Economics

The authors divide institutions into one of five classes based on graduation rates (here I quote from p. 24):

Medallion: the segment comprising the nation’s most competitive institutions and students; a segment for which prestige-based ranking plays a substantial role in defining institutional ambitions and quality.
Name Brand: a segment largely populated by well-known institutions. Most practice selective admissions, though their appeal is more likely to be regional than national. Many but not all of these institutions would like to be Medallions.
Good Buy: a segment comprising a variety of institutions, for the most part offering full-scale undergraduate programs at prices substantially less than those of higher-ranked institutions.
Good Opportunity: a segment comprising institutions and students who see higher education as a special opportunity. Many students who shop in this segment are the first in their families to attend college.
Convenience: the one segment in which part-time and intermittent learners dominate. Students in this segment often shop for a friendly environment at an institution that understands special needs, including the need to take courses at convenient times.

We can get get a visual of the distribution of these market segments by looking at a k-means plot based on IPEDS data. I’m using the six-year graduation rate and net average price as my variables.

K-means clustering four-year schools on graduation rate and average net price

We can se pretty clearly that there are divisions at the top of the market with some very expensive schools that graduate students at high rates, but lots of clustering near the means on both axes. In fact, the dense cluster near the center of the axes might make the clustering a bit suspect in this case. Certainly the medallions (yellow, in the upper right) make a distinct cluster, but when we come in near the sample mean there’s relatively little to differentiate some of the Name Brands (blue) from Convenience (pink) or Good Buys (light blue). These distinctions might be more apparent (and might especially differentiate the Convenience segment) if we included two-year colleges in the analysis.

I generated a scree plot to confirm this intuition. Looks like we could actually get away with as few as three or four clusters, but five doesn’t look unreasonable. It’s worth noting here that about 24% of the schools analyzed for clustering were placed in the first “Convenience” cluster, suggesting that our findings, using slightly updated data, are roughly in line with TCST.

Zemsky et al. identify schools in the lower left as those most likely to suffer financial distress in the coming years, while they expect “medallions” to thrive. “The winners [in their analysis] were the big, top-ranked institutions. Conversely, being small, being located in the Mid-Atlantic, the Plains, or the Great Lakes, proved to be a disadvantage, as was being at or near the bottom of the market in terms of competitiveness.” (27)

The parameters of the titular “stress test” vary by class, but broadly cover two areas: enrollment and finances. For example, here are the metrics the authors use to evaluate the current and future health of a public four-year college or university:

Percent change in freshman enrollment;
First-year retention rate;
Market price (i.e. tuition less institution-granted financial aid);
Percent change in ratio of endowment to expenses.

The metrics are similar for four-year private, for-profit and two-year institutions.

Scores, which range from 0–12 for four-year institutions, are assigned as follows:

Each college and university can receive a risk score for a given factor ranging from 0 to 3. An institution receives a 0 for a particular component if (1) the change over time (or for retention, the rate) does not reach the alert line and (2) the trend line of the data does not have a negative slope that is projected to reach the alert line…. By contrast, an institution receives a 3 if (1) the negative trend reaches the alert line, (2) the “change over time” both falls below the alert line and (3) reaches or falls below the warning line…. Risk scores of 1 or 2 are achieved if some but not all of the criteria are met. (49)

In other words, we’ll divide our data into classes and look at each distribution along the variables outlined above. We assign critical values (“alert” and “warning” lines) based on those distributions. For example, we can take a quick look at changes in first-year enrollment among four-year colleges and universities and establish our critical values (“warning” in orange, “alert” in red):

If a data point falls below the “alert” line, that’s one point; if the trend line for a component crosses the “alert” line, that’s a point; if the trend line crosses the “warning” line, that’s two points. The sum of those points (which ranges from 0–12 across four metrics) is our stress test score.

Architecture 112: Design Thinking

We can see whether it works and compare our results with what’s in TCST by way of confirmation.

I’m using Jason Bryer’s ‘ipeds’ package for R to get the data. It’s worth mentioning now that I could not get this to work on R version 4.0.0. I was successful after upgrading to R 4.0.3.

First, I download and explore the data to identify the points we need, then I store the necessary tables in a data folder that the app will have access to (we might be able to script the download on the server side of the Shiny app, but this will be faster and cleaner).

These are the available IPEDS data sets by year (note that I hadn’t downloaded all these at the time I wrote this piece of code):

require(ipeds) 
require(tidyverse)available_ipeds() %>% tail(15) %>% kable() # for formatting

We can see that some are final and some are provisional. The difference is that final data are fully revised and are released about one calendar year later than the provisional data, which are released about a year after data is collected. Thus, for the finalized 2017–2018 IPEDS data set, we’re looking at data that were collected in 2015–2016; for the provisional 2018–2019 data set, we’re looking at data collected in 2017–2018.

Each of the IPEDS years listed here represents a series of tables. We need to identify the tables (and years) we need. Once we’ve done that, we can discard some of the additional (unused) data, and create a directory for our app.

TCST lays out in Appendix A how to build your own stress test. The workflow used there is the same one we’ll adapt to the app. The steps are:

Get the data and chart it;
Tabulate a stress index;
Determine “alert” and “warning” values (these are based on the distribution of data points along each of the metrics).

TCST looks at long-term changes in a college’s fortunes. The authors chart their metrics over an eight-year period, between 2007–2008 and 2015–2016. Because we’re going to be fitting a regression line to our data, we’re going to need to extract data from several years of surveys.

To recap, we have four primary metrics that we’re looking at: enrollment numbers, first-year retention rate, market price, and endowment / expense ratio. The IPEDS tables that correspond to these metrics are:

Institutional Characteristics, where we find institutional information like public vs. private, two-year vs four-year, and primary keys for each school;
Fall Enrollment, where we get enrollment data;
Finance, where we get information on endowments and expenses.

I’ve posted an R notebook detailing the data extraction process in the GitHub repo, here.

I’ll walk through the process briefly here.

We download and extract the tables to see what we’ve got:

#```{r download and extract ipeds data} download_ipeds(2019, cleanup = T, dir = 'ipeds_data')load_ipeds(2019) %>% names # check column names - these are the \
# available tables from the 2019 IPEDS surveys

Each of the IPEDS files comes with a large data dictionary in .XLSX format. I looked through that documentation manually to identify both the tables and variables needed for the stress test.

My goal was to create a “master” data frame to hold all the necessary variables that will then be accessed by our app. Here I demonstrate an example of extracting one variable, in this case enrollment numbers:

extract_enrollment <- function(year){ 
    #' extract enrollment data from IPEDS surveys in a given year 
    ipeds <- load_ipeds(year)
    survey_string <- paste0("DRVEF", year - 1)
    survey <- ipeds::ipeds_survey(survey_string, year = year)
    survey <-  survey %>% 
        select(c(UNITID, EFUG1ST)) %>% 
        mutate(year = year)
    }data_list = list()for (year in seq(2012,2019)){
    data_list$year = year 
    data_list[[year]] <- extract_enrollment(year)
}enrollment_df <-  do.call(rbind, data_list)# we need to calculate the YoY change per Zemsky et al. enrollment_panel_indexed <- 
    enrollment_panel %>% 
    group_by(UNITID) %>% 
    mutate(index_enrollment_change = (EFUG1ST / first(EFUG1ST) - 1 * 100) +  100) %>% 
    mutate(index_enrollment_change = index_enrollment_change * 100) %>%
    ungroup()

I follow this process for the other variables , then join each together based on the primary key (UNITID) and year to create panel data— check the linked notebook for the complete process.

With all of our variables extracted and stored, we can compute the critical values (the “alert” and “warning” levels described above) we need to assign scores.

critical_values <-
    data %>% group_by(id) %>%
    slice(2:n()) %>% # remove index rows
    ungroup() %>%
    group_by(control) %>% # public and private
    summarise(
        enrollment_warning = quantile(index_enrollment_change, .2),
        enrollment_alert = quantile(index_enrollment_change, .1),
        # do retention separately -- we need the index rows for that
        price_warning = quantile(index_price, .2, na.rm = T),
        # several NaNs to remove
        price_alert = quantile(index_price, .1, na.rm = T),
        ratio_warning = quantile(index_ratio, .2, na.rm = T),
        ratio_alert = quantile(index_ratio, .1, na.rm = T),
        appropriation_warning = quantile(index_appropriations, .2, na.rm = T),
        appropriation_alert = quantile(index_appropriations, .1, na.rm = T)
    )

To recap the methodology briefly, scores range from 0 (good) to 12 (worse); there are four metrics: enrollment, retention, average net price, and either ratio of endowment assets to total expenses (in the case of private schools), or level of state appropriations (in the case of public schools); each metric can take a value from 0–3. If two points fall below the “warning” level in a category, that’s one point; if the linear trend line for a component falls below the “alert” level, that’s two points; if it falls below the “warning” level, that’s one point.

Because some schools (especially smaller colleges and vocational schools) don’t have all their data in IPEDS (or because some values like total expenses or endowment assets are entered as 0), I’ve done some filtering to include only schools whose data was complete enough to generate a score. All told, I have data and scores for some 1,711 institutions, 675 publics and 1,036 private schools.

Again, the source code for the app, comprising UI and server functions can be found on GitHub. The key pieces to include are the actual stress test score (1–12 for public and private four-year colleges) as well as visualizations of the various indicators laid out in TCST.

The resulting dashboard is designed to give users a choice of college or university (again, only four-year institutions are included to reduce the complexity of the dataset) and then generate stress test metrics and graphs for each of the elements outlined above. I’ve included not only individual charts for each school and each metric, but also some visualizations describing the distributions of scores among a school’s peers. Note that here I control for both size and sector; public schools are only compared with other public schools, and private schools with private schools. Similarly, schools are grouped into quintiles based on incoming freshman class size, and are only compared with other schools in their quintile.

The link to the app is here.

Physics 311: Special Topics in Calamity Physics

TCST is skeptical but frank about the dangers facing US colleges and universities:

What the numbers tell us is that just 10 percent or less of the nation’s colleges and universities face substantial market risk. Sixty percent face little or no market risk. But the remaining 30 percent are institutions that are bound to struggle….(117)

We’ve already identified graphically those schools that are likely to struggle given graduation rates and prices.

The losers — those institutions already at substantial risk — owe their bad luck to a grab bag of unexpected consequences.

What do the data show about the overall distribution of risk? Here’s the private school distribution:

And publics:

It looks like risk is more or less evenly distributed across sizes, with a bit of additional risk among smaller schools (the spike among large private schools is due to a limited sample size of 2).

And on which metrics are schools failing? First, public schools:

And private schools:

Overall, it looks like schools are most likely to struggle with student retention and enrollment. For public schools specifically, increasingly limited state appropriations are a serious concern. (This has been in the news recently, both because of Coronavirus, and because of broader austerity measures by state governments.)

The good news, of course, is right there on the X-axis. The mean score, among both private and public institutions, is low; the median score is 0. Most colleges, be they public, private, large, or small, are not in immediate trouble.

Next Steps

There’s a wealth of data available through IPEDS, some of it useful. Though TCST’s authors decline to produce a predictive model, there are other avenues we could pursue. One is a survival analysis looking at which colleges do or do not make it. In the sample I collected for this project initially, looking at data from 2011 to 2018, I found 119 schools that had data in 2011 but not subsequent years. I haven’t investigated these fully, but these are likely to represent closures. Taking TCST’s metrics as a starting point, we might be able to construct a worthwhile model of survival for colleges.

Until then, I invite my readers to look at the dashboard and pursue their own investigations. This is an interesting time to evaluate the fortunes of higher education; hopefully administrators at struggling schools are already aware of changes that need to be made.