Waterfall plots in R

Last week I posted about an orphaned paper of mine, a piece of research that hasn’t yet found a home and which I’m unlikely to find further time to work on. Abandoning something that you’ve worked hard on is disappointing and so it made sense to release it for those who were interested. At least that way, I can salvage some value from the whole exercise.

There’s another element of the paper that might be useful too, at least for the R community: waterfall charts. Here’s the example from the paper:

Summary of changes to global building energy demand in 2050 by intervention and sector.

Summary of changes to global building energy demand in 2050 by intervention and sector.

And here’s another example from the Carbon Trust, about carbon flows and the UK economy.

Carbon emissions embodied in UK trade.  Source: Carbon Trust

Carbon emissions embodied in UK trade. Source: Carbon Trust

As you can see, these are effectively bar charts but designed to be read left-to-right so that you can see how a series of intermediate steps leads to a final conclusion. This can be implemented in ggplot and it’s largely just a question of getting the data in the right format. For the solution I’ve developed, you need to create a data frame bearing in mind a few simple guidelines.

  • The category column is a factor with the levels in left-to-right order. These will be used to determine the x-axis position.
  • Only the first and last entries in the value column represent the absolute measured units. The other entries should be given as changes relative to the previous column. The waterfall() function will throw a warning if the running total minus the last value doesn’t equal zero.
  • The sector column can be used to create divisions within each bar.

The resulting data frame looks like this:
Waterfall data frame
The hard work is done by the waterfall() function which I’ve made available as a Gist. Given the data frame described above, this function does some additional manipulation and then returns a ggplot object, which you can then further customize as necessary. The resulting plot is shown below and the full code is available on Github.

Carbon emissions embodied in UK trade, drawn with ggplot.

Carbon emissions embodied in UK trade, drawn with ggplot.

10 thoughts on “Waterfall plots in R

  1. Spencer Haley

    Thank you so much for sharing both your paper (which deserves a good home) and the waterfall construction. I hope to employ similar analyses around smaller-scale energy efficiency measures, and this post gives me a wonderful place to build from.

  2. Pingback: Bookmarks for February 28th | Chris's Digital Detritus

  3. James Keirstead Post author

    I’ve deliberately left the title ambiguous. “Plot” makes sense because the code uses ggplot, but chart and plot are synonymous to most people so I don’t think there’s any risk of confusion here. I’ve also linked to the Wikipedia page on waterfall charts.

    I’d never heard of a waterfall plot as a distinct type of 3D graph, but looking at the Wiki Talk page, I’d have to agree with the comment at the bottom: “Hands up everyone who thinks that a waterfall plot looks like a waterfall. I certainly don’t.”

  4. Guido

    Hey James, nice function you wrote there! Thx!
    I am trying to produce my own waterfall plot, but unfortunately I get the

    error in sprintf("Final value doesn't return to 0. %.2d instead.", final_value):
    invalid format '%.2d'; use format %f, %e, %g or %a for numeric objects
    Called from: sprintf("Final value doesn't return to 0. %.2d instead.", final_value)

    Does that mean I am not allowed to use decimal numbers!?!? That would be strange…

    My value vector df$value looks like this:
    8.32 1.15 -0.21 1.14 -0.08 2.32 0.29 0.08 -0.16 0.10 -0.48 12.47

  5. James Keirstead Post author

    Well-spotted Guido! That sprintf statement assumes that the input data are integer; I didn’t write any checks in the code to catch this. You can certainly use decimal numbers, either by changing the sprintf statement to use %.2f or by changing the error to a warning.

    The later might be better for float values as, with rounding, your final_value might not equal 0 exactly. Additionally I would rewrite the if statement to use the testthat package and an expect statement. That would let you set an explicit epsilon (or error tolerance) for checking that decimal values add to 0.

  6. Guido

    Thanks for your quick answer!
    I just found a workaround. I just modified the
    df % mutate(cs1=cumsum(value))
    df % mutate(cs1=round(cumsum(value),2))
    and now it works!

  7. David Whitaker

    Hi, I’m trying to run the code exactly as posted, but I get this error:

    The following `from` values were not present in `x`: col, color, pch, cex, lty, lwd, srt, adj, bg, fg, min, max

    Any ideas?

Comments are closed.