The interesting thing about fear is that to reduce it requires two contradictory impulses. First, we can reduce fear by mitigating the consequences of failure. If we construct areas where experimentation is less costly, we can feel safer and therefore try new things. On the other hand, the second main way to reduce fear is to engage in the feared activity more often. By pushing the envelope, we can challenge our assumptions about consequences and get better at what we fear at the same time. Thus, it is sometimes a good idea to reduce fear by slowing down, and sometimes a good idea to reduce fear by speeding up.
To illustrate this point, I want to excerpt a large part of a recent blog post by Owen Rogers, who organized my recent trip to Vancouver. I spent some time with his company before the conference and discussed ways to get started with continuous deployment, including my experience introducing it at IMVU. He summarized that conversation well, so rather than re-tread that material, I'll quote it here:
One thing that I was surprised to learn was that IMVU started out with continuous deployment. They were deploying to production with every commit before they had an automated build server or extensive automated test coverage in place. Intuitively this seemed completely backwards to me - surely it would be better to start with CI, build up the test coverage until it reached an acceptable level and then work on deploying continuously. In retrospect and with a better understanding of their context, their approach makes perfect sense. Moreover, approaching the problem from the direction that I had intuitively is a recipe for never reaching a point where continuous deployment is feasible.
Initially, IMVU sought to quickly build a product that would prove out the soundness of their ideas and test the validity of their business model. Their initial users were super early adopters who were willing to trade quality for access to new features. Getting features and fixes into hands of users was the greatest priority - a test environment would just get in the way and slow down the validation coming from having code running in production. As the product matured, they were able to ratchet up the quality to prevent regression on features that had been truly embraced by their customers.
Second, leveraging a dynamic scripting language (like PHP) for building web applications made it easy to quickly set up a simple, non-disruptive deployment process. There’s no compilation or packaging steps which would generally be performed by an automated build server - just copy and change the symlink.
Third, they evolved ways to selectively expose functionality to sets of users. As Eric said, “at IMVU, ‘release’ is a marketing term”. New functionality could be living in production for days or weeks before being released to the majority of users. They could test, get feedback and refine a new feature with a subset of users until it was ready for wider consumption. Users were not just an extension of the testing team - they were an extension of the product design team.
Understanding these three factors makes it clear as to why continuous deployment was a starting point for IMVU. In contrast, at most organizations - especially those with mature products - high quality is the starting point. It is assumed that users will not tolerate any decrease in quality. Users should only see new functionality once it is ready, fully implemented and thoroughly tested, lest they get a bad impression of the product that could adversely affect the company’s brand. They would rather build the wrong product well than risk this kind of exposure. In this context, the automated test coverage would need to be so good as to render continuous deployment infeasible for most systems. Starting instead from a position where feedback cycle time is the priority and allowing quality to ratchet up as the product matures provides a more natural lead in to continuous deployment.
The rest of the post, which you can read here, discusses the application of these principles to other contexts. I recommend you take a look.
Returning to the topic at hand, I think this example illustrates the tension required to reduce fear. In order to do continuous deployment at IMVU, we had to handle fear two ways:
- Reduce consequences - by emphasizing the small number of customers we had, we were able to convince ourselves that exposing them to a half-baked product was not very risky. Although it was painful, we focused our attention on the even bigger risks we were mitigating: the risk that nobody would use our product, the risk that customers wouldn't pay for virtual goods, and the risk that we'd spend years of our lives building something that didn't matter - again.
- Fear early, fear often - by actually doing continuous deployment before we were really "ready" for it, we got used to the real benefits and consequences of acting at that pace. On the negative side, we got a visceral feel for the kinds of changes that could really harm customers, like commits that take the whole site down. But on the plus side, we got to see just how powerful it is to be able to ship changes to the product at any hour of the day, to get rapid feedback on new ideas, and to not have to wait for the next "release train" to put your ideas in action. On the whole, it made it easier for us to decide to invest in preventive maintenance (ie the Cluster Immune System) rather than just slow down and accept a larger batch size.
When a new engineer started at IMVU, I had a simple rule: they had to ship code to production on their first day. It wasn't an absolute rule; if it had to be the second day, that was OK. But if it slipped to the third day, I started to worry. Generally, we'd let them pick their own bug to fix, or, if necessary, assign them something small. As we got better at this, we realized the smaller the better. Either way, it had to be a real bug and it had to be fixed live, in production. For some, this was an absolutely terrifying experience. "What if I take the site down?!" was a common refrain. I tried to make sure we always gave the same answer: "if you manage to take the site down, that's our fault for making it too easy. Either way, we'll learn something interesting."
Because this was such a big cultural change for most new employees, we didn't leave them to sink or swim on their own. We always assigned them a "code mentor" from the ranks of the more established engineers. The idea was that these two people would operate as a unit, with the mentor's job performance during this period evaluated by the performance of the new person. As we continued to find bugs in production caused by new engineers who weren't properly trained, we'd do root cause analysis, and keep making proportional investments in improving the process. As a result, we had a pretty decent curriculum for each mentor to follow to ensure the new employee got up to speed on the most important topics quickly.
These two practices worked together well. For one, it required us to keep our developer sandbox setup procedure simple and automated. Anyone who had served as a code mentor would instinctively be bothered if someone else made a change to the sandbox environment that required special manual setup. Such changes inevitably waste a lot of time, since we generally build a lot more developer sandboxes than we realize. Most importantly, we immediately thrust our new employees into a mindset of reduced fear. We had them imagine the most risky thing they could possibly do - pushing code to production too soon - and then do it.
Here's the key point. I won't pretend that this worked smoothly every time. Some engineers, especially in the early days, did indeed take the site down on their first day. And that was not a lot of fun. But it still turned out OK. We didn't have that many customers, after all. And continuous deployment meant we could react fast and fix the problem quickly. Most importantly, new employees realized that they weren't going to be fired for making a mistake. We'd immediately involve them in the postmortem analysis, and in a lot of cases it was the newcomer themselves (with the help of their mentor) who would would build the prophylactic systems required to prevent the next new person from tripping over that same issue.
Fear slows teams of all sizes down. Even if you have a large team, could you create a sandboxed environment where anyone can make changes that affect a small number of customers? Even as we grew the team at IMVU, we always maintained a rule that anyone could run a split-test without excess approvals as long as the total number of customers affected was below a critical threshold. Could you create a separate release process for small or low-risk commits, so that work that happens in small batches is released faster? My prediction in such a situation is that, over time, an increasing proportion of your commits will become eligible for the fast-track procedure.
Whatever fear-reducing tactics you try, share your results in the comments. Or, if fear's got you paralyzed, share that too. We'll do our best to help.
(with apologies to Frank Herbert)