Book review: The Evolution of Great World Cities

The Evolution of Great World CitiesLooking at Yves Marchand and Romain Meffre’s recent photoessay on Detroit, one can’t help but wonder what happened. How did a city that was literally the engine of the American economy sputter and decay into a mass of peeling paint, broken windows, and faded twentieth-century glamour? And on the flip side, how can a city like Dubai, with its Burj Khalifa tower stretching nearly 1 km into the sky, rise out of the desert in such a short period of time?

These are some of the questions that Chris Kennedy’s new book, The Evolution of Great World Cities, seeks to answer. Kennedy is a professor of civil engineering at the University of Toronto with a soft spot for cities and economics. Launching the book in London last week, he noted that he originally wanted to investigate the wealth of cities in much the same way that Adam Smith had done for the wealth of nations. But along the way, the book changed into something subtly different and the result is a fascinating mix of macroeconomics, infrastructure engineering, history, and ecology.

Philadelphia's City Hall

The book has three main themes. First, it provides a useful definition for the wealth of cities, namely as the cumulative assets of its citizens. This omits the value of public buildings and infrastructure, a distinction that seems counter-intuitive at first. How can a city’s most prominent buildings, such as Philadelphia’s $6 billion dollar town hall not be included in such a total? However as Kennedy describes, the value of these facilities is reflected in the locational value of people’s homes; a home with access to first-rate transport, water, and energy supplies will have a higher value than a cabin in the woods with no such services and amenities. Using the example of 16th century Seville, the importance of citizen ownership is also demonstrated. Tonnes of Incan gold may have flowed through Seville’s gates, but most of it was destined for the hands of foreign owners, namely the bankers in Antwerp and Genoa that financed many of the trans-Atlantic expeditions.

The second theme is the connection between economic growth and the physical structure of cities. Take the automobile as an example. Its introduction in the early twentieth century led to a new mode of urban development, suburban sprawl, which although it has many disadvantages certainly leads to increased consumption. Cars need to manufactured, sold, and serviced; larger suburban homes need to be constructed with more materials and filled with consumer goods. The key issue here is not the infrastructure itself, but the modes of consumption that it necessitates. For example, one might expect that the 1990s IT revolution would have led to a demand side crisis. Just like Smith’s pin-makers, those displaced by the productivity gains of IT – bank tellers, backroom operations, and so on – would be unemployed and unable to consume. However IT also created new opportunities in PC manufacturing, software design, and innovative business models like EBay and Amazon. This new online way of life drove a new cycle of demand, rejuvenating the economy.

To me, this is the book’s most important contribution. This idea of the autonomous consumption of infrastructure, that is the societally mandatory level of minimum consumption created by these systems, is hugely important for understanding not just the economic growth that Kennedy is concerned about, but more broadly the sustainability of cities. North American urban sprawl is a perfect example. While it engenders high levels of consumption, maintaining that lifestyle depends on a throughput of resources, most importantly abundant and affordable transport fuel. Should these fuels become significantly more expensive, these cities would grind to a halt. The same level of infrastructure-mandated autonomous consumption would still be there, but it would now consume a much larger portion of a household’s income, leading to reduced savings, reduced investment in new opportunities, and eventual stagnation. In the language of sustainable development, the autonomous consumption of infrastructure represents a liability that must be serviced on an on-going basis.

The third theme is an analysis of urban economic processes as ecological systems. The relevant chapter offers several noteworthy ideas but it felt incomplete compared to the rest of the tightly-argued book. Nevertheless, an excellent description of Detroit’s decline is provided and the statistics cannot fail to astound: a population decline of 50% between 1930 and today, over 60 square miles of vacant abandoned land (44% of the city’s area), local tree species poking through factory floors that once produced millions of automobiles. Kennedy argues that, in much the same way that an ecosystem needs diversity to survive a changing environment, Detroit was too focused on cars in order to survive, both in terms of the limited diversity of its economy and the inflexibility of its infrastructure. One of the great what-ifs posed by the book is what Detroit might look like now had a 1923 proposal for a subway and integrated transit system been implemented.

Illustrated with case studies on cities as diverse as Toronto and Montreal, Philadelphia and New York, Seville, Paris, Dubai, and London, The Evolution of Great World Cities is a unique work of economic geography. Engineers often complain that economic models are too abstract to offer a meaningful understanding of the real world. Kennedy has therefore done both professions a great service by presenting a strong argument that it is the links between our built environment and economies that matter most.

Chris Kennedy has also written a blog post about the book over at the World Bank’s Sustainable Cities site.

Posted in General, Reading | Tagged , , , , | Leave a comment

Grrr…

I’ve been working through Gelman et al.’s otherwise excellent Bayesian Data Analysis and it’s going reasonably well. My statistics is a little bit rusty so it’s taken time to work through all of the exercises and really understand what’s going on. But I say “otherwise excellent” because yesterday I spent ages trying to figure out a problem, only to discover that the data published in the book don’t correspond to the text discussion.

The troublemaker is the SAT problem in section 5.5. The authors give values for two variables y_j and \sigma_j for j=1\ldots8 schools, rounded to integer values. Using the formulas given in the text, I then calculated estimates for the complete pooling statistics and came up with a posterior treatment effect, y, of 7.7 and a variance of 16.6. However the text says these values should be 7.9 and 17.4 respectively.

I puzzled over this for quite a while, thinking that maybe I’d missed a prior/posterior distinction somewhere and my estimates were supposed to be subtlely shifted. But no, when I went and checked the original data source, I found that the values were reported to 4 sig figs. Repeating the calculation with the new values gives the expected results. Grr…

Here’s the data and R-code if anyone’s interested.

### Load SAT data from the Gelman et al book and the original Rubin paper
df <- data.frame(school=LETTERS[1:8],
                   book.y=c(28,8,-3,7,-1,1,18,12),
                   rubin.y=c(28.39,7.94,-2.75,6.82,-0.64,0.63,18.01,12.16),
                   book.sigma=c(15,10,16,11,9,11,10,18),
                   rubin.sigma=c(14.9,10.2,16.3,11,9.4,11.4,10.4,17.6))
 
### Rearrange data into a handy form
df <- melt(df,id="school")
vals <- colsplit(df$variable,"\\.",c("source","metric"))
df <- cbind(df,vals)
df <- df[,-2]
df <- cast(df,school+source ~ metric)
 
### Calculate the summary statistics
ddply(df,.(source),summarize,y=sum(y/sigma^2)/sum(1/sigma^2),sig=1/sum(1/sigma^2))

Created by Pretty R at inside-R.org

Posted in R | Tagged , | Leave a comment

Sampling for Monte Carlo simulations with R

When doing Monte Carlo simulation, it’s important to pick your parameter values efficiently especially if your model is computationally expensive to run. If the model takes two days to run, and a parameter x ranges from 0 to 10, it doesn’t make much sense to run it once at x=4.9 and again at x=5.1 if x \in (0,4), (6,10) hasn’t been explored at all.

To get around this problem, one can use quasi-random low-discrepancy sequences which are designed to fill a parameter space efficiently. The R package, randtoolbox, provides implementations of common sequences, like the Halton or Sobol’, but the process involves a couple of steps that beg to be automated. The general process is:

  1. Generate a n x p matrix of uniformly distributed quasi-random values, where n is the number of simulations you wish to run and p is the number of parameters.
  2. For each column of the matrix, convert the quasi-random value to the parameter’s actual distribution by inverting the cdf curve. So if you have a uniformly distributed value of 0.5, and you want to convert it to a normal distribution with mean \mu you would get a parameter value of \mu for use in your simulation.
  3. Run your simulation with these parameter values, and analyse the results

I’ve written a little R function to make this process easier. You simply pass it the number of simulations you want to run, and a list describing each parameter, and it will return the Monte Carlo sample as a data frame. At the moment, it’s pretty rudimentry, and each parameter is described by a name, a distribution name (matching the R abbreviations, e.g. “unif” for the uniform distribution, “norm” for the normal), and two parameters to describe the distribution.

# Generate a Monte Carlo sample
generateMCSample <- function(n, vals) {
  # Packages to generate quasi-random sequences
  # and rearrange the data
  require(randtoolbox)
  require(plyr)
 
  # Generate a Sobol' sequence
  sob <- sobol(n, length(vals))
 
  # Fill a matrix with the values
  # inverted from uniform values to
  # distributions of choice
  samp <- matrix(rep(0,n*(length(vals)+1)), nrow=n)
  samp[,1] <- 1:n
  for (i in 1:length(vals)) {
    l <- vals[[i]]
    dist <- l$dist
    params <- l$params
    samp[,i+1] <- eval(call(paste("q",dist,sep=""),sob[,i],params[1],params[2]))
  }
 
  # Convert matrix to data frame and label
  samp <- as.data.frame(samp)
  names(samp) <- c("n",laply(vals, function(l) l$var))
  return(samp)
}

Created by Pretty R at inside-R.org

Here’s a simple example to show how it can be used.

n <- 1000  # number of simulations to run
 
# List described the distribution of each variable
vals <- list(list(var="Uniform",
               dist="unif",
               params=c(0,1)),
          list(var="Normal",
               dist="norm",
               params=c(0,1)),
          list(var="Weibull",
               dist="weibull",
               params=c(2,1)))
 
# Generate the sample
samp <- generateMCSample(n,vals)
 
# Plot with ggplot2
library(ggplot2)
samp.mt <- melt(samp,id="n")
gg <- ggplot(samp.mt,aes(x=value)) +
  geom_histogram(binwidth=0.1) +
  theme_bw() +
  facet_wrap(~variable, ncol=3,scale="free")

Created by Pretty R at inside-R.org

And here’s the resulting picture:

Histogram of three parameters in a Monte Carlo sample

Histogram of three parameters in a Monte Carlo sample

Any suggestions on how to improve this function so that it has a more generic description of a distribution would be appreciated (e.g. for distributions with n!=2 parameters).

Posted in R | Tagged , , | Leave a comment

Announcing Lectures, my first GitHub project

For the past couple years, I’ve been using SVN for version control. It was my first introduction to these tools and it took a while to get the hang of branching, tagging, and resolving the dreaded conflicts. But now it’s an integral part of my workflow and I can’t count the number of times it’s saved my digital bacon.

However I recently switched to GitHub for some of my projects. This was partly out of curiosity, as a number of prominent projects and coders seem to swear by it (Twitter, Hadley Wickham, Kieran Healy). But more practically, I didn’t want to be tied to a single server any more and the idea of a Git repository as a self-standing archive was very appealing.

I’ve now been using Git for a couple weeks and, again, I’ve had to figure out a new mystical language of push, pull, fetch, merge, etc. If you’re familiar with SVN or CVS, the trickiest thing is trying to figure out the Git analogues of all your usual tricks. Git from the bottom up is a fantastic resource in this regard and provides a clear explanation of Git’s internal structure and how to work with it. For more mundane, how-do-I-do-x type things, Github’s Help is very, umm, helpful.

Anyway, all that’s not really here nor there. I mainly wanted to announce my first real Git project: lectures! It’s a collection of LaTeX style files and Python glue to make the creation of both lecture notes and slides much easier. If you’ve used beamer, you’ll know that you can build these two sets of documents from a single source file – provided you’ve got two separate header files. With lectures, all you have to do is write a single source file and run a simple command:

> build-lecture source style

Out pops two PDF files, one for presentation and one with your full notes. There is also support for some basic style configuration. I’ve tried to keep this to a minimum, such as changing fonts (with XeLaTeX for OpenType support) and colours.

Have a go, try the code, and let me know what you think!

PS: I’ve just noticed that a similar project exists called BeamerLecture. Looks like the main difference is that it produces four outputs, hiding answers in pre-class notes, which is a nice feature. The formatting is closer to the beamer defaults though and it doesn’t use XeLaTeX for easy font changes.

Posted in General | Tagged , , , | Leave a comment

Getters and setters in R

When I first started using R, one of the things that attracted me was its claim to be an object-oriented programming (OOP) language. Coming from a Java background, I was used to designing software with OOP concepts like encapsulation and inheritance but, when I turned my hand to R, I quickly realized that “object-oriented” meant something subtlely different.

For those who are interested, the technical detail is explained at length in this paper and this blog (with examples). But what I want to do here is quickly illustrate how you can implement the common get/set method structure in R for a slightly different purpose.

In “traditional” OOP, you might have an object like a Shape with some attribute, say its area. In Java, this attribute would be accessed and altered using get and set methods, as in:

Shape s = new Shape();
s.setArea(10); // set the attribute value
int area = s.getArea(); // get the attribute value

Using this pattern offers advantages like protecting access to the value of the area field and ensuring that only valid values are set. While there is some disagreement about whether or not mutator methods represent good practice, every OOP coder must have had at least one occasion where a getter/setter was needed.

Which brings us back to R. I recently had a problem where I wanted to use something like a getter/setter in order to access a global variable consistently, from both inside and outside functions. Since variable scoping in R isn’t very intuitive, I wasn’t sure how to do this at first and so I thought the get/set paradim might be helpful, even though the value I was getting and setting wasn’t really associated with an object.

Here’s what I ended up using. The trick is to create an explicit environment that stores the variable’s value behind the scenes, so that the user doesn’t have to worry about scope.

# Declare an explicit environment to hold the variable
e1 <- new.env()
 
# Sets the value of the variable
setArea <- function(value) {
  assign("area", value, env=e1)
}
 
# Gets the value of the variable
getArea <- function() {
  return(get("area", e1))
}

Created by Pretty R at inside-R.org

Again, this isn’t really object-oriented programming as I’m not manipulating an object (other than the environment e1) but the get/set pattern has an OOP pedigree. So if you’re coming from a Java or C++ background and are having trouble figuring out variable scoping, the above code might be useful for you. However if you want to do proper object-oriented getting and setting, I highly recommend John Myles White’s example of R object-oriented polymorphism in get/set methods.

Posted in R | Tagged , | Leave a comment

Positive coefficient regression in R

I’m currently working on a paper about simulating urban demands for electricity and gas at 5 minute resolution. To do this, I have a simple regression model that tries to explain observed consumption based on local population figures and simulated levels of activity demands (e.g. minutes spent at work, leisure, etc). The data set looks like this, where each of the letters is an activity code:

> head(data)
  zone  pop     elec       gas  A    B   I    J  L M     O   R    S     W
1    0 7412 46221768 124613714  0    0   0    0  0 0     0   0    0     0
2    4 7428 37345875 100944002 60 3060 120 1020  0 0  4900 390  510 28635
3    7 7464 20914281  64109628  0 1155 510  255  0 0 11000 225 2475 29580
4   10 7412 46221768 124613714  0  680   0  390  0 0  5145   0    0  9300
5   14 7128 69233086  36611811  0 1335 105  210 60 0  5970   0 2520 14910
6   17 7608 40783190  59343776  0  150   0  150  0 0  1500   0    0   555

Created by Pretty R at inside-R.org

I then performed a basic regression using lm, removing the intercept (the “- 1″) as I want the population coefficient to serve a similar purpose for this analysis:

lm.elec <- lm(elec/365 ~ pop + A + B + I + J + L + M + O + R + S + W - 1, data)

Created by Pretty R at inside-R.org

Using Andrew Gelman’s helpful arm package, I can get a quick overview of the result as shown below. It doesn’t look too bad at first, but then I noticed all sorts of negative coefficients for the activity levels. This makes no sense: when individuals perform activities such as going to work, going to school, or shopping, we expect that their demand for electricity should go up, not down.

> display(lm.elec)
lm(formula = elec/365 ~ pop + A + B + I + J + L + M + O + R +
    S + W - 1, data = data)
    coef.est coef.se
pop    13.99     1.35
A     177.07    83.87
B     -63.90    27.80
I      -0.80    21.05
J      91.27    34.19
L   -1075.76   391.53
M      -5.57    73.50
O      55.47     9.90
R    -336.14   178.98
S      28.12     3.84
W      -8.19     3.16
---
n = 391, k = 11
residual sd = 159994.17, R-Squared = 0.65

Created by Pretty R at inside-R.org

Since a linear regression is essentially an optimization problem, my immediate thought was: can I just constrain the coefficient values so that they are all positive? This would mean that some activities might have no significant effect on consumption, but at least they couldn’t have a negative impact. And it turns out, yes, you can do this using the nnls package and function.

The nnls function is not quite as user-friendly as lm so the first thing you have to do is manually define your input variable matrix and output vector. For example:

A <- as.matrix(data[,c("pop","A","B","I","J","L","M","O","R","S","W")])
b.elec <- data$elec/365

The analysis can then be run as:

nnls.elec <- nnls(A,b.elec)

Similarly, we can’t use predict to generate our results and have to manually perform the matrix multiplication. This can be done as shown below.

coef.elec <- coef(nnls.elec)
pred.elec.nnls <- as.vector(coef.elec%*%t(A))

This method also doesn’t give you an r2 value per se, but you can estimate it with the following dummy regression:

> lm.elec.dummy <- lm(b.elec~pred.elec.nnls - 1)
> display(lm.elec.dummy)
lm(formula = b.elec ~ pred.elec.nnls - 1)
               coef.est coef.se
pred.elec.nnls 1.00     0.04
---
n = 391, k = 1
residual sd = 164861.13, R-Squared = 0.62

As you can see, the r2 value is slightly lower in this case than the standard lm model but in terms of interpreting the coefficients the result makes much more sense. This can be clearly seen in the following graph, where demands for electricity and gas rise during the day as expected, rather than sinking in the basic case.

Simulated profiles for electricity and gas, comparing lm and nnls regressions

Simulated profiles for electricity and gas, comparing lm and nnls regressions

So there you go. If you ever need to run a regression and ensure that all the coefficients are greater than or equal to zero, nnls is your friend.

Posted in R | Tagged , , | 2 Comments
  • Archives