Book reaction: How Big Things Get Done • Aaron J Epel

Flyvbjerg, Bent, and Dan Gardner. How Big Things Get Done: The Surprising Factors That Determine the Fate of Every Project, from Home Renovations to Space Exploration and Everything in Between. Currency, 2023.

In the spirit of How Big Things Get Done, let’s start with the end goal in mind - what do I want to get out of reviewing this book?

I saw a recommendation for this on a social media thread about what it takes to bring hobby software projects to completion. Since I would like to work on some side projects over the summer, but am concerned about how much results I’ll actually be able to get out of them, I thought this book would be appropriate. I’m also interested in larger questions around why does the US - especially at the state and local level - struggle to complete major infrastructure projects at all, let alone on time and at budget. The glib answer involves politics, veto points, and graft, but I suspect this is too simplistic and cynical. More personally, I just finished a minor home improvement project to replace a bathroom exhaust fan, and while it wasn’t a massive ordeal I still underestimated the amount of time it would take and the complications that arose. This book sounded more and more relevant to my interests!

I wanted to identify what lessons I could take from Flyvberg and Gardner that would help me run projects smoother with fewer nasty surprises. Despite how trivial some of the insights appear at first, I found it worth my time. At worst, it’s a quick and accessible read and a good reminder of best practices I’ve encountered elsewhere.

Here are some selected thoughts - not a comprehensive walkthrough - that I’d like to share. This is more of a book reaction than a book review, as they may make more sense if you have already are familiar with How Big Things Get Done.

Think right to left

“Projects routinely start with answers, not questions.” (p. 47)

Much of this was already familiar to me because it strongly rhymes with principles from Six Sigma methodologies like DMAIC and DMADV. That is, the very first step that satisfies customer needs is to understand their fundamental needs - and that means asking questions! Many of the cautionary case studies brought up to illustrate this reminded me of traps where “Voice of the Customer” isn’t sufficiently explored; this is another lens on it.

Critically, asking the right questions to clear up ambiguity in the desired end state necessarily leads to the “victory conditions” that we can test for to know if we are done. Then the act of measurement - whether we have hit a discrete milestone, or whether we have hit some good-enough metrics threshold - is itself answering a question through a test.

In that spirit, I’ve found it useful to explicitly try to answer “How do I know when I am done?” when planning out a new project.

I also wonder to what extent public works, especially big infrastructure projects, would benefit from explicitly admitting that prestige is the goal to optimize for. Tellingly, this is a major motivation for the successful case study of the Guggenheim Museum Bilbao (pp. 48-50). World Fairs in the late 19th and early 20th centuries showcased cutting-edge inventions, in part to stimulate trade but often just for national prestige; only later did these technologies become commercially viable. Would it be better to have a demonstration high-speed rail in California that was explicitly optimized for prestige but completed on budget, or an incomplete and overbudget system that is nominally, but not credibly, in service of a serious regional transport plan? I’m not sure - but the current CAHSR is struggling to deliver its stated goals as is.

Think slow, act fast and planning by tinkering

Earlier, the authors explicitly warn against “planning by doing” and underinvesting in the planning phase. What makes “planning by tinkering” different? Why isn’t this “winging it” or wasteful iteration?

I think the key to reconciling this is to focus on where the experimentation takes place. Flyvberg and Gardner emphasize experimenting with scale models, digital twins, etc. but there’s a more general principle here: move the iteration from costly space to cheap space.

When Flyvberg and Gardner write that “planning is cheap” (p. 75), this is only true if you are planning in cheap spaces! Many of the failure case studies they describe are actually still planning, but are doing it in production! The solution is therefore to iterate inversely proportional to how expensive an experiment is.

There’s a spectrum of how costly iteration/rework is depending on its medium:

Design-on-paper, whiteboarding, etc. is very cheap to iterate but is a weak model of reality.
Simplified simulations are more costly to iterate but more faithful to the real thing.
Detailed digital twins, physical scale models, live rehearsals, etc. are even more informative but still more costly to set up. In particular, the initial set up (e.g. digital data collection) may have a high upfront cost even if repeated experiments are cheap on the margin.
Finally, real-world execution is the most costly but also, well, the real thing.

The discussion of “Pixar planning” in Chapter 4 illustrates this well. The creative staff isn’t just iterating through ideas within the same level of resolution; they’re progressing across increasing levels of resolution at increasing cost. (Is there a potential analogy with denoising diffusion models?)

One more thought: each level of resolution needs its own “victory condition”. How do we know that we’ve accomplished what we want with this level of approximation? This is another point in favor of the planning-as-experimentation perspective mentioned above.

The value of experience

I would like to explore further the argument that first-mover advantage is overrated. The study referenced is Golder and Tellis 1993; it’s available online with a quick search. On face this seems plausible; after all, for better or worse, this is the motivation for patent law!

I might rephrase Flyvberg and Gardner’s point as that: every past example that can be learned from is an additional thumb on the scale to improve your likelihood of success. Identifying past experience (explicit or tacit) is then stacking advantages for the project.

Insofar as this runs into conflict with stakeholder desires for the “first” or “biggest” (p. 85), I think this is another case of the underlying goal (“the box on the right”) actually being prestige rather than operational outcomes. Which, if we are honest about it, is fine! But that may motivate better ways of executing a demonstration project.

The authors also discuss seeking out exemplars for subcomponents of projects, even if the sum-of-all-parts is unique. This seems reasonable, too, but not all projects can be implemented as combinations of tried-and-true existing approaches. Where does revolutionary, as compared to incremental, innovation happen? To make the implicit explicit, I think this follows from the process of asking and answering questions. It’s a good practice to start off expecting that a solution for the design requirement exists, then investigating why the existing solutions fall short of what’s needed. That then motivates the direction of the research. Rhymes with Chesterton’s Fence!

Forecasting

Chapter 6 complements the previous topic by discussing anchoring and adjustment. Specifically, how do we get poor project estimates if we rely on experience to formulate them, as recommended above?

The risk is specifically “settling on a bad anchor” - using the wrong or a too-specific reference class. The authors’ remedy is to identify a simple reference class with plenty of historical experience, and look at actual outcomes to make estimates.

Stated another way, the accuracy of a project forecast (timeline, budget) is influenced not just by how relevant the comparable case studies are, but also by the number of comparable case studies. This is due to the law of large numbers. Flyvberg and Gardner assert that project planners often undervalue this second factor! When the number of case studies $n$ is tiny, each additional data point has a huge impact on the accuracy. Their argument is that at these sample sizes, it is worthwhile to add an extra data point even if it’s less similar than the earlier case studies. There is a point where the tradeoff is no longer worth it - the accuracy improvement from a greater sample size is outweighed by the irrelevance of the next data point - but most project forecasts aren’t even close to that threshold.

There’s a counterargument that adding less-relevant data points by expanding the reference class of the project can increase the variance of the forecast too much. I’m not sure this is a practical objection, though. For one, improved accuracy of a point estimate is valuable for project planners who still aren’t comfortable analyzing distributional forecasts. However, even if the estimated distribution of outcomes is wider than it “should be”, this serves to make planning err on the risk-averse side.

I would like to learn more about techniques for actually doing reference-class forecasting, specifically how to construct a good reference class. Endnote 15 on page 224 looks like a pretty extensive bibliography on this.

I appreciated the discussion of taking preventative measures to identify long-tail risks and mitigate them as part of the project plan. This is just FMEA! It’s worth gaming this out during planning; I wonder what the most efficient way is to do this iteratively over planning iterations at different resolutions (see section above).

What’s your Lego?

This chapter - and the emphasis on modularity and scale - represents a whole other topic that’s extremely valuable on its own. The main response I want to emphasize is that this nicely complements the earlier discussion on the value of experience. Designs that are modular and repeatable automatically generate tons of practice opportunities! To the extend that this is also automatable, we can then also engineer out other failure modes. For such an extensive topic by itself, I thought this was an appropriately sized mention of it in service of the book’s point.

Teambuilding

In contrast, I found this Chapter 8 the least concrete of the major sections of How Big Things Get Done. Aside from strong storytelling with the case study of Heathrow Airport Terminal 5, much of the relevant lessons boil down to:

Less risky to hire an existing, experienced team than to assemble a new one.
If you must assemble a new team, make sure positive incentives are aligned to reward cooperation.
It’s worth investing in culture so the team buys into one identity and a shared purpose.
Leadership needs to understand what makes team members feel valued and provide that.

Phrased this way, the advice seems uncontroversial and even trite - but the fact that so many organizations are terrible at this is evidence that they are highly nontrivial! As with the above discussion on “What’s your Lego”, this is a much bigger topic than can fit in a single chapter, so I understand why it seems so superficial. I think it is probably appropriately sized for this book, though I would have appreciated a little more insight into how one accomplishes the above. For example, what made BAA’s workplace posters effective in building team spirit, versus other attempts that come across as obvious propaganda?

Conclusions

At 190 pages excluding appendices, acknowledgments, endnotes etc. I found How Big Things Get Done worthwhile for the effort I put into it. Even though I was already familiar with several of the topics discussed, reading them with a different phrasing and terminology encouraged me to take a fresh perspective on them. I especially liked how practical many of the key insights are, as apparently advertised. This is not the same as being literal or concrete! Rather, I found myself actively wondering how I would operationalize something like iterating more in cheap contexts than in costly contexts; credit to the authors if that was their intention! Without getting into too much detail, as I worked through the book I also found myself pattern-matching each topic to past professional projects that were sources of frustration. I will definitely take guidance from this as I move forward with some personal projects over the next few weeks - and others in the future.