Open Source - Research Questions for the NSF
Ben Hyde
Apache Software Foundation and Intuit
January 27, 2002


For the past several years every six seconds a new active Apache web server appears online, a trend seemingly uneffected by the travails of the Internet's commercial sector. Clearly Open Source has become a vast enterprise. It would be nice is we actually understood what's going on here.

Something extremely valuable to society is going on here. But what is going on? We have plenty of ornate insta-theories, the handy work of many clever people. I'll share one of mine with you in a moment.

What we lack is models and data to back up these theories. It seems clear to this observer that there are huge untapped opportunities to make the engine of open source run faster, smoother, better. Imagine that!

Let me enumerate quickly some models of open source that have been useful to me, and a few questions they raise. Let me foreshadow that by mentioning four terms here: virtuous cycle, public good, network effect, and club good.

Open source thrives when you get a strong cycle going. This cycle spins around a loop like so:

A breakdown at any point in this loop will sap the strength of a project. Each stages in this loop and the ecology at large deserves thoughtful analysis. What's exciting is that we have a huge set of examples that can be leveraged for both the analysis and experimentation; what better laboratory could you ask for?

Let's take one example of one stage in this loop. CVS, the tool used to manage most source repositories in the community, is more than a decade old. It plays a key role in managing the flow of innovations into the source code. Why has it evolved so little. It could use a solid kick in the butt.

We have serious problems just coordinating the queue of incoming patches. We lack good tools for managing the aggregation and evolution of ideas - as compared to patches - for shifting the design center of our projects. Or consider another example, lots of people read our code but we don't currently capture any of their thoughts about that code back into the shared asset.

This is a virtuous cycle. It is driven by small incentives that many overcome small barriers. Consider distribution for example; I can set up a mirror for the apache distribution in less than a half hour; my reward is a warm feeling, a good story, or maybe the thank you from a nice person. It's hard to understand this ecology because these benefits and the barriers they overcome are often all very small and very diverse. Which brings us to more questions we need help answering.

This virtuous cycle is the life blood of an Open Source project. The faster it spins the more vital and valuable the project becomes. As it gets hotter and hotter a funny thing happens. The project(s) becomes less about the elegant innovative features of the software and more about the community around the software.

When the project begins to exhibit this network effect the whole game begins to change. Which brings forth a quite a few additional questions.

Given that open source projects strive to be a public good, i.e. non-rival, and non-exclusionary, how are we doing?

In point of fact open source projects are not entirely public goods. We do regulate them - we do decline to let 'just anybody' modify the master copy of the sources. The economists call such things club goods. We draw a fence around the master copy of the sources. This is our cell wall and it is critical to the functioning of an open source project. It's engineering deserves some serious research.

For example in the Apache HTTPD project we strive to maintain this fence so that people entrusted to edit the master copy are allowed this right on 'merit'. The rules we use to define merit have been tuned over the years and they work reasonably well. But I doubt they are the only rules one could adopt and they certainly likely to color the nature of the team that manages the sources. For example only recently did we draw in a core member who only works mostly on documentation.

Well that's all folks...








Ben Hyde
Last modified: Sun Jan 27 10:51:18 EST 2002