Wednesday, July 29, 2009

Embrace technical debt

Financial debt plays an important and positive role in our economy under normal conditions. Yet, especially in times like these, it’s easy to rail against the badness of being in debt; it’s a very human feeling. Remember Hamlet?

Neither a borrower nor a lender be;
For loan oft loses both itself and friend,
And borrowing dulls the edge of husbandry.

Technical debt works the same way, and has the same perils. Here’s one of my favorite introductions to the subject, courtesy of Martin Fowler:

In this metaphor, doing things the quick and dirty way sets us up with a technical debt, which is similar to a financial debt. Like a financial debt, the technical debt incurs interest payments, which come in the form of the extra effort that we have to do in future development because of the quick and dirty design choice. We can choose to continue paying the interest, or we can pay down the principal by refactoring the quick and dirty design into the better design. Although it costs to pay down the principal, we gain by reduced interest payments in the future.

The human tendency to moralize about debt affects engineers, too. Many conclude that technical debt is a bad thing, and that teams that incur technical debt are sloppy, irresponsible or stupid.

In this post, I want to challenge that idea, by talking about real-world situations where debt is highly valuable. I hope to show why lean and agile techniques actually reduce the negative impacts of technical debt and increase our ability to take advantage of its positive effects. As usual, this will require a little theory and a willingness to move beyond the false dichotomy of “all or nothing” thinking.

I won’t pretend that there aren’t teams that take on technical debt for bad reasons. Many legacy projects become completely swamped servicing the debt caused by past mistakes. But there is more to technical debt than just the interest payments that come due. Startups especially can benefit by using technical debt to experiment, invest in process, and increase their product development leverage.

In a startup, we should take full advantage of our options, even if they feel dirty or riddled with technical debt. Those moralizing feelings are not always reliable. In particular, try these three things:

Invest in technical debts that may never come due.
The biggest source of waste in new product development is building something that nobody wants. This is a sad outcome which we should work very hard to avoid. Yet there is one silver lining when it does happen: we wind up throwing out working code, debt-riddled and elegantly designed alike. This happened quite often in the early days of IMVU.

For example, I’ve talked often about our belief that an instant messaging add-on product would allow IMVU to take advantage of a network effects strategy. Unfortunately, customers hated that initial product. The thousands of lines of code that made that feature work were a mixed bag – some elegantly designed and under great test coverage, others a series of hacks. The failure of the feature had nothing to do with the quality of the code. As a result, many technical debts were summarily cancelled. Had we taken longer to get that feedback by insisting on writing cleaner code, the debt would have been much deeper.

Accept that good design sometimes leads to technical debt anyway.
Discussions of technical debt are usually framed this way (again from Martin Fowler):

The metaphor also explains why it may be sensible to do the quick and dirty approach. Just as a business incurs some debt to take advantage of a market opportunity developers may incur technical debt to hit an important deadline.

This framing takes for granted that the quick and dirty approach will incur significantly more technical debt than the slow and clean approach. Yet other agile principles suggest the opposite, as in YAGNI and DoTheSimplestThingThatCouldPossiblyWork. Reconciling these principles requires a little humility.

Most of us think we know a good design when we see it. Unfortunately, no matter how much up-front analysis we do, until the design is tested by actual practice, we can't really know. Outside the world of hypothetical examples, it's more important to make continual progress than to build the ultimate design.

For example, at a previous virtual world company, we spent years developing an architecture to cope with millions of simultaneous users. Unfortunately, we made two critically flawed assumptions: that customers would primarily consume first-party assets that we shipped to them on CD and that they would tend to congregate in a relatively uniform way. Neither assumption proved remotely accurate. The design failure meant that there was constant thrashing as the servers struggled to provision capacity according to the “elegant” algorithm we’d designed.

As in many scalability decisions, we’d have been much better off investing in agility, so that we could change the architecture in response to actual customer demand, rather than trying to predict the future. That’s what Just-in-time Scalability is all about. Sometimes quick and dirty actually incurs less debt.

Leverage product development with open source and third parties.
Financial leverage refers to investing that is supplemented by borrowed money. Similarly, product development leverage refers to situations in which our own work is fortified by the work of outsiders. For example, early on at IMVU, we incorporated in tons of open source projects. This was a huge win (and we were delighted to give credit where it was due), because it allowed our initial products to get to market much faster. The downside was that we had to combine dozens of projects whose internal architectures, coding styles, and general quality varied widely. It took us a long time to pay off all the debt that incurred – but it was worth it.

In addition, third-party services and API’s enabled us to do more with less, but at a cost: taking on the technical debt of products and teams outside our direct control. We’re not accustomed to accounting for technical debt that occurs in code that we don’t write, but this is short sighted. It’s important to learn to see the whole system that makes our product work: human as well as machine, internal as well as external.

For example, IMVU’s early business model was made possible by Paypal’s easy self-serve and open access payment system. However, we’ve often had to put up with unreliable service, caused by their inflexible internal architecture. We had to live with their technical debts without being able to repay them. It was still a good trade.

Not all debts are created equal.
Interest rates vary, so we should be selective about taking on new debts. Given the choice between incurring technical debt in a particular end-user-visible feature and incurring the same level of debt in a core system, I’d much prefer the former. Here’s why:

  • There’s a chance that I’ll never have to pay for that particular debt, because the feature may have no value for customers.

  • It’s possible that the feature, even with debt, might be good enough, and therefore not need revision for a long time. Technical debt manifests as rigidity or inflexibility. When modifying a part of the product afflicted by debt, the work requires a lot of extra – and unpredictable – clean up. But if a given feature is rarely modified, its debt is much less expensive.

The opposite is true with debt in a core system; it’s much more likely that this debt will slow down our ability to make changes later on. For example, an unreliable library deep in the core will manifest as intermittent defects all throughout the product, each of which is hard to localize and debug. Side-effects that reduce agility are the most damaging symptoms of technical debt.

Lean vs. debt
In the world of physical goods, the leaner a supply chain is, the less debt is required to operate it. This makes lean supply chains more robust in the face of the unexpected: if sales suddenly dry up, they are stuck with less unsold inventory and simultaneously have less debt to service. The just-in-time nature of the value chain reduces risk in the face of uncertainty and is also more capital efficient.

A similar relationship applies to technical debt. Teams that practice an agile or lean development process are able to minimize the accumulation of technical debt without sacrificing speed, because they work in smaller batches. They also take better advantage of debt, because they find out sooner if a particular investment has paid off. Traditional development teams, by contrast, often build and deploy large systems before learning if their early choices were sensible, and therefore wind up with a much larger debt to pay. In fact, by the time they become aware of it, they’ve already started to pay significant interest on that debt.

Invest in speed instead of features or debt
This relationship between lean and debt opens up new approaches for dealing with technical debt. The usual debate is phrased as an either-or choice between taking more time to “build it right” or taking a shortcut and incurring more debt. But those are not our only two options. Taking on technical debt does allow investing energy elsewhere, but other new features are not the only option.

We can trade technical debt for process improvement, too. If that improvement pays off (by reducing the batch size of our work, for example), it becomes easier to address all technical debt in the future – including the debt just incurred. And because any particular debt might never come due, this is a better trade. To take one concrete example, it’s often worthwhile to write test coverage for legacy code even without taking the time to refactor.

This reverses the standard intuition about what engineering activities add value, which usually concludes that test coverage is a form of necessary waste but a refactoring is value-added work. However, a refactoring (by itself) might go stale or introduce unintended side-effects. Adding test coverage will make it easier to refactor in the future and also reduce our fear of making changes elsewhere.

Investing in the dynamics of development is more valuable than investing in the static status quo. Startups are always moving, so invest in moving faster and better.

Technical debt in the real world
So far, all of these considerations have been framed in the form of abstract either-or tradeoffs. Real life seldom presents such comparable choices. Instead, we balance lots of unknowns. How much technical debt will a particular approach incur? How likely will customers ultimately use that feature? How painful will it be to refactor later? How much will it slow us down in the meantime? And how much more expensive would it be to do it right? Oh, and how likely is it that the “right” approach actually is?

Luckily, there are better options for these complex decisions than picking an easy extreme, like “never incur technical debt” or “anything goes.” Instead, we can choose a disciplined approach to making proportional investments in prevention and paying down debt, such as Five Whys. They work by focusing our energy on making process and technical changes in precisely those areas that are causing the biggest waste and slowdown.

This is better than making abstract choices about where to invest: better design, paying down old debts, or better process. Instead, techniques like Five Whys teach us to view the entire application and product development team as one integrated system. From this holistic viewpoint, we can optimize accordingly.

Once we can see opportunities for truly global efficiency gains, all that remains is to ensure our team actually makes room for those investments. To do that, we add specific speed regulators, like integrating source control with our continuous integration server or the more elaborate dance required for continuous deployment. This produces a powerful combination: the speed of just-in-time experimentation wedded to a discipline of rigorous waste-reduction.

One last thought. When I talk and write about the advanced product development process at IMVU today, like the cluster immune system or the disciplined approach we take to split-testing and interaction design, it may sound as if we had that capability from the start. Nothing could be further from the truth. The early IMVU was riddled with legacy code and technical debt. We spent endless hours arguing about whether we’d made the right choices in the past. And with the benefit of hindsight, it’s clear that we often made serious mistakes. As one engineer recently told me, “Once we had money in the bank and were near-profitable, I think we would have been well-served by increased up-front product and technology planning. As a culture, we hadn’t yet learned how to make long-term decisions.” He’s right.

In the end, what mattered wasn’t that we did everything right, but that our fundamental approach was flexible and resilient. At no point did we stop everything and do a ground-up rewrite. Instead, we incrementally improved our process, architecture, and infrastructure, always learning and adjusting. The blur you see today is the result of the beneficial compounding interest of that approach applied with discipline over many years. Trust me, it’s a lot of fun.

(This post was tremendously enhanced by a number of early readers from the Twitterverse. You know who you are. Thanks so much.)

Reblog this post [with Zemanta]


  1. Eric, Great article - provides context and vocabulary for dialog on a very interesting tension in startups.

    Technical debt is a necessary instrument, but does cause real pain in the lives of developers - in terms of lost time and in terms of the psychological pain of going against engineering principles and experience. ;) But I think the article strikes the right tone in terms of balance.

    Very much appreciate the suggested solutions (Five Why's, split testing etc)

  2. This is a good analysis of technical debt, but I have a hard time extracting a clear point from it.

    The title 'embrace technical debt' and much of your early wording suggests that technical debt is 'valuable' and even advocates it as something that's desirable, but most of the points you make merely provide reasons why technical debt is an acceptable cost for making progress, or an unavoidable consequence of working with third parties.

    I really like the depth and breadth of your analysis, but the concept of embracing technical debt makes me a bit nervous.

    I think you hit this point on the head near the end by suggesting the application of a technique like Five Whys for balancing the two extremes (disallowing debt entirely, and considering any form of debt acceptable).

    I'm not sure that it makes sense to consider, say, PayPal's subpar API, as technical debt. You pay an ongoing cost in order to use that API, in the form of maintenance and debugging, but it's not as if it's debt you decided to take on and can decide to pay off later on down the road. The PayPal team might have technical debt in *their* code base, but as a consumer of their API, you don't.

    Now, on the other hand, if your codebase for dealing with PayPal also incurs technical debt, perhaps as a result of trying to quickly work around their issues, or fixing problems without fully understanding them - then that makes sense. But that's not necessarily a given.

    I am also uncertain about your suggestion that good design inevitably leads to technical debt.
    If you have a piece of well-designed, well-implemented code that doesn't help you meet your goals, I wouldn't call that technical debt. The code does exactly what was intended, and it's easy to understand, improve upon, or throw out, because you factored it correctly in the first place.

    That feels more like wasted effort as a result of bad decisions than technical debt to me. And in that particular case, I would be especially cautious about embracing *that* kind of 'technical debt', since one thing programmers enjoy doing is building things the company doesn't need :)

    The line is admittedly pretty blurry in some cases; if the entire problem you're trying to solve is a technical one, I suppose you'd be well-justified in labelling almost any historical screwup or bad decision as 'technical debt'.

  3. Kevin, really appreciate the thoughtful comments.

    I want to challenge the idea that you can have "well-designed, well-implemented code that doesn't help you meet your goals." My feeling is that good design has to encompass the goals of the artifact being designed. Unfortunately, in a lot of situations, the true goals aren't really clear until the design is complete or nearly-so. In that case, what you thought was good design can suddenly turn out to be really bad design. I think that's a kind of technical debt.

    That's still not to say that there aren't just regular old bad designs out there - but I want us to pay more attention to this particular case because it is so often overlooked. It's not that good design "inevitably" leads to debt, it's that spending time on good design doesn't necessarily prevent debt. In such cases, you're better off experimenting or investing in lean process.

    With regards to third-party API's, I stand by the idea that you should consider all the debt that affects your team, whether you control it or not. When you incorporate a third-party library that is debt-ridden, that counts. When a third-party hosts that same library for you, and you acess it via an API, it should count, too. The same intermittent bugs that you waste time debugging in your own code can happen behind an API. You'll still waste a lot of time attempting to debug them.

  4. It is so true, but non-tech managers usually try to ignore the facts away.

  5. Hi, Eric. As usual, an interesting post.

    Advocating for use of technical debt is sort of like saying American consumers could make good use of more financial debt. This is true for some small number of very responsible people, but extremely dangerous advice for the majority, who already are in a lot of debt, have poor debt management skills, and haven't solved the personal and cultural issues that got them in so deep in the first place.

    That said, I agree with a lot of your more detailed points. It definitely can be useful to judiciously take on small amounts of debt. There is also some level of architectural exploration where there are so many unknowns that it makes sense to throw away all good technical practice and just hack. (The Extreme Programming folks call those "spikes", and generally don't check them in.) From a product perspective, it can also be very useful to try something quick out to see what the response is before investing more deeply.

    However, I get the impression you're mixing the notions of something that's well designed from the product perspective and something that's well designed from the engineering perspective. These kinds of design are mostly orthogonal. Technical debt only refers to the latter.

    The questions of whether something should ship on CDs, or what the user behavior will be, or how many users to build for? Those are product questions. Technical debt refers only to the technical decisions made to meet those product goals.

    I agree totally with you that a disciplined, experienced team with the right culture will eventually learn to make the right decisions in regard to this.

    However, until engineers have spent at least a year developing in a very low-debt code base, they don't have the right experience, and will tend to go down the familiar high-debt path. They quite literally cannot imagine what people like Fowler and Beck are talking about. It's like asking a 1930s coal miner to be good to the environment.

    And I want to close by mentioning, mainly as a caution to readers who aren't as familiar with your work, that the approach you describe can only work in the cultural context you promote. I have never seen (or even heard of) a normal team in your average large organization ever learn to manage technical debt dynamically. Poor feedback loops and command-and-control nonsense always dump them in the weeds. For them, "no technical debt" is a great rule to live by.

  6. > However, I get the impression you're mixing the
    > notions of something that's well designed from
    > the product perspective and something that's well
    > designed from the engineering perspective. These
    > kinds of design are mostly orthogonal. Technical
    > debt only refers to the latter.

    Thanks for another thoughtful comment. I'll continue to push back on this distinction. The technical design of a product is intimately linked to its product design. For example, how fault tolerant should it be? A good technical design should have "just the right amount" of fault tolerance. How costly is a failure? How does the fault tolerance affect our ability to make changes in the future? How desirable is the ability to make changes in the future? The only way to to answer these questions is to ask product questions, yet they are essential for a good technical design.

    Even more tricky is the fact that there are interaction effects between the two designs. The faster we get the technical design done, the faster we start to get feedback about _both_ the technical and product designs. We have to look at design as a dynamic process. Sometimes choosing a worse design results in a better design.

    That's not a practice, that's a fact. Deciding how to deal with that fact is what we have to struggle with when we build teams.

    I don't think that "spikes" are sufficient, although they are very useful in situations where we know in advance we are going to throw away our experimental code. But most design problems occur midway through implementation right inside a larger codebase. What then? There's nothing to throw away, unless we revert - but having the ability to revert large amounts of work implies that we're working with a very large batch size.

    I think we're much better off focusing on shrinking the batch size of work. That gives us better options and faster feedback with regard to debt - and design.

  7. Nice post~

    I really like the idea of writing test coverage for legacy code before actually making the changes. Unfortunately some legacy projects make test coverage almost impossible. At least in strongly typed languages such as Java. Test Harnesses usually work with MVC architectur,e which is still a newer concept as far as big business goes.

    Either way I feel that it is a great idea if the Test to Implementation timeline is about 1 to 1.

  8. I would suggest (from my experience) that getting it right first time is actually cheaper and faster than making what you think are short cuts at the time. Poor testing or poorly separated concerns within your code base often come back to bite you a lot sooner than you think.

    If you're gonna play the game of "when is it OK to create technical debt" you firstly have to know that it's debt in the first place, but also have to have a very clear idea of the consequences. How many of us could say we're in a position to be able to do that? I don't think I am.

    As always, it's our man Ward Cunningham who came up with the phrase technical debt. Interestingly he never intended it be inferred that technical debt is something you could make the conscious decision to accrue:

    "I'm never in the favor of writing code poorly, but I am in favor of writing code to reflect your current understanding of the problem even if that understanding is partial."

  9. Excellent article.

    I find technical debt a really helpful metaphor when introducing new teams to the responsibility they and their product managers have for maintaining the quality of the product over time and ensuring the lasting value of their companies investment.

    It's good to see that the metaphor continues to hold up and you have demonstrated a number of insights and opportunities that present themselves when you incur debt wisely.

    There is a dark side!

    Often technical debt is incurred with due consideration but without intent or budget for pay back. I have seen too many systems being re-written of suffering from crippled productivity due to a management team that do not expect or understand the nature of technical debt.

  10. Eric, could you please explain what you mean by "process" ?

  11. Hi, Eric. I think the confusion arises because the you include more things under "technical" than I do. And I think my notion of the split is roughly where Beck and Fowler would draw the line, too.

    I agree the two kinds of design are somewhat related, and that technical people will benefit from understanding the deeper purpose. But I think they're still mostly orthogonal. Let's consider cases:

    Can you have a piece of software with good product design and bad technical design? Yes. In that case, the user experience and system performance might be good, but the cost of future development would be high.

    Can you have a piece of software with bad product design and good technical design? Surely. Market fit or user experience might be awful, but engineers looking at the code base would find little to quibble about.

    For your example of the right amount of fault tolerance, I believe that's mainly a business decision. I encourage putting that on cards and treating them as part of the regular feature flow. How to implement a given level of reliability is the technical decision. For more examples, see the "-ilities" section here:

    Design is definitely dynamic, and definitely benefits from feedback, so I'm also in favor of shipping early and often. Small batches rule.

    However, I don't think technical quality has to get in the way of that. If one is truly doing incremental design, then you rarely need to do much technical design to get something quick and easy out the door. Low technical debt now doesn't mean that you don't improve the design later as the feature gets bigger. It just means that this tiny or provisional thing is shipshape given that it is a tiny or provisional thing.

  12. > Eric, could you please explain what you mean by "process" ?

    Sure. I don't have a formal definition, but what I mean is the set of practices that we adopt as a team to build our product. How do we make decisions about the platform? How do we prioritize new features? How do we debug them? How do we deploy them?

    Our process may include the use of tools and infrastructure, too. So when I say something like "invest in process" I intend that to mean that we can evolve our tools and our use of them, too.

    Does that help?

  13. Eric, thanks for the post.

    couple of points I thought worth mentioning.

    First, I really find it hard to imagine a circa 2009 web start-up that doesn't start with a huge technical date. Absurdly, it works both for start-ups that are thinking structuraly about customer discovery (and hence want to experiment first, and design nicely second) and for start-ups that still follow the good old product dev. path hence urgently pushing to release asap.

    I think that if a web founder would come to me asking for an investment talking about how he's coding an immensly scalable, perfectly designed system, I will throw him out in an instant. Perhaps the best example of that was Cuil, who was actually built on the ability to scale faster and better and found out that that's not as important as figuring out what urgent problem do you solve to which customer.

    This btw, is not true to all aspects of software and all realms of life. For example, the rating system of the iPhone app store makes it extremely important for a new app front end to be very close to perfectf rom day one, even if the backend is held by a shoestring and a piece of gum.

    Finally, sometime embracing technical debt is happening in much larger orders of magnitude than we can imagine. For example, back in the early 90's two multi-billion dollar r&d projects were racing to build a missile defense system. The Israeli Arrow system, and the US THAAD. The Israeli strategists made a conscious decision of incurring a huge technical debt by building a system they knew would only answer a certain portion of the problem, and would need to be gradually evolved and replaced (i.e. pay 'interest'). The Americans, on the other hand, decided to build a full system, perfectly designed, with no short-cuts. The results were that the Israeli system became operational 10 years ago, while the THAAD project is starting to finally be deployed this year after 10 years of delay. On the other hand, now that it's deployed, it's a much more capabale system.

    This example shows that really the type of debt and its extent depend on what are your more strategic objectives. If you're an Israeli who need something ASAP because the threat of getting fired on with scuds is rather real, then taking on huge technical debts makes a lot of sense. If you're an American who can take all the time and most of the money in the world to get it just right, then you might as well do that :)

    Keep on the good work,


  14. > Low technical debt now doesn't mean that you
    > don't improve the design later as the feature
    > gets bigger. It just means that this tiny or
    > provisional thing is shipshape given that it is a
    > tiny or provisional thing.

    Let me push back a little more. It's not that I disagree that product and technical design are separate, it's just that I think we under-appreciate the level of interaction, especially under conditions of uncertainty such as those that startups face.

    Let me take a concrete example. Remember IMVU's initial IM add-on product? It had a pretty good technical design. Here why:

    - it kept each IM network in its own separate module, and made it really easy to add new IM networks by composing a set of common objects
    - it separated the underlying transport from the IM "session" itself, so it was robust in the face of the underlying client acting strangely, going away, or even having conversations switch clients altogether
    - it compacted all of its information into brief, human-readable text messages that could be sent over any IM network in the clear

    Those were strictly technical design decisions, and I think they were really good. Unfortunately, when we realized the product design was not what customers wanted, we had to pivot to a new product. But we had to bring that old codebase with us. Now the assumptions and abstractions that had served us well started to serve us badly. When we became a standalone network, it didn't matter how easy it was to add new networks, since we never did. And having the session abstracted from the transport made debugging much harder. Worse of all, the plaintext codes we were used to sending were considered non-authoritative, since they could be pulled off a third-party network. This made the actual transport much more difficult on our first-party network than was really necessary.

    As a result, we have had to be constantly refactoring this design, a little bit at a time, to smooth out these rough edges. These design changes feel a lot like the interest payments incurred by technical debt. My argument is that there is no distinction to be had. That "good design" turned out to be technical debt, after all.

    What I object to most is the idea that technical design is a linear quantity. There's no such thing as "improving the technical design" in any absolute sense. You can only improve it with regard to whatever the purpose of the current product is. When that purpose is changing, we're necessarily chasing a moving target.

    I think the reasons our intuitions as engineers are so messed up here is that they were formed in an era where the rate of change was simply much lower. When we work on a product that takes many years to design and build, it natural to start seeing design decisions on a linear scale - because the product's purpose is clear and relatively stable throughout. Imagine the architect of a cathedral trying to make sense of this distinction. They'd think we were crazy.

    So far, this is all an academic exercise in naming - but I think it has an important real-world effect. Once we recognize that time invested in good design can wind up as technical debt, we're can start to ask ourselves "what else should we invest that time in?" And, as I tried to outline in the article, process improvements that reduce overall batch size might be a better choice.

  15. Darn good - I have struggled in relevant conversations with our in house move to scrum/agile. Things like "where is the design time?" and "isn't planning on refactoring just waste?". It all seems so clear to me. But this blog put into better perspective. Good job.

  16. Eric, great post. I'd like to understand how embracing technical debt works for you with a cross-functional problem team and prototyping.

    Are the engineers in the customer development team allowed to push quick and dirty "prototypes" to production?.

    It would certainly make sense to validate an internal feature proposal with split-testing, but I'm worried about the safety measures to avoid major customer complaints and loosing customers:
    * the continous deplayment inmune system would be able (by itself) to prevent it?
    * should the test subjects be part of the customer advisory board?
    * what would be the profile of these engineers? I t makes sense to me to put seasoned programmers here, specially if they had made the shift to "marketing" after frustration for the lack of customer development.

    Wow, a lot of questions, I just hope is not to late to add a comment to this post.