Thursday, 17 September 2009

Rebuilding the Development Process Part 1

Early in 2008, our development programme was in trouble. We were developing a long-standing web product with a very stable and well-proven core. We'd tried to expand quickly and had hired new people, but it wasn't really giving us returns in productivity. The existing system worked well, but we were having a lot of trouble getting enhancements completed.

Part of it was that we had made a bad hire - a developer who hated the technology we use, hated us, and generally preferred to play on his Playstation rather than doing any work. It dragged the team down for a while. But even when he was out, we still had work to do. Also a backlog of feature work meant that a lot of developer time was being redirected to urgent client fixes, displacing planned features, and ironically increasing the backlog.

We had - and still have - really great people. Bucketloads of experience and technology. But in early 2008 we still weren't productive as a team. The usual development issues plagued us, which will be familiar to anybody who's run a big development project. New features were breaking existing parts of the system, everything was running late, we kept having to break off to meet customer needs. As always, the customer needs were *always* urgent, and pushed us further off track.

We decided to make three changes to start with:
* Invest in automated test
* Create continuous integration servers
* switch to monthly releases (see a later post)

Automated Test

We figured that with our limited test resources and a very large system, automated test was the only way to go. Just walking through the system manually took days. So we started to invest in various types of automated test:

* Functional testing, using the SOAP API that we expose for clients
* UI testing. Options for testing Web UIs are very limited. We picked Selenium, which has turned out to be a really good choice.
* Module testing, using NUnit.

We started small, with relatively few tests, and picked off some low hanging fruit by automating the existing APIs, and some of the simpler pages. Enough to get results with modest investment.

Continuous Integration.

We wanted something that would let us know as soon as something got broken by a change. CI, using CC.Net fitted the bill perfectly. It tirelessly rebuilds the latest state of the project and re-runs a key subset of the automated tests. Automating the whole build process up-front was too hard, so we took a stepwise approach, first picking off the parts of the system that were easy to automate. Then over a few months, we bit-by-bit automated the whole build.

Combining the two approaches together, we then made the CI server run the key automated tests. Being open source, cc.net is easy to extend and very flexible. It neatly integrated together tests on all the different technologies in the system - Javascript client code, ASP, ASP.NET, VB and C# back-ends. So we were able to add tests into the CI process one-by-one.

Early success came with the immediate feedback when the build was broken by code being checked in. CC.net alerted the development team within a couple of hours, and the developer could fix their code quickly and with minimal effort. Without continuous integration, we wouldn't have found the bugs in the new checkins for weeks or months, by which time it would have taken major investigation to identify the cause.

Payback

So we were convinced that the new proceses were helpful, but did they make business sense?

Setting up the CI servers took about 4 person weeks of time, and on its own saved about 2 person days per month. So direct payback time is about 10 months. But the biggest savings are that, because everyone is developing against an up-to-date solution, bugs are noticed early, they can be fixed quickly and informally, and this saves bucketloads of time for developers.

Setting up the first batch of automated tests took around 3 person months, and directly saved around 3 days per month. So payback time of about 1.5 years. Not so good?
* Automated functional testing and UI testing are great, and also mean that bugs are noticed early, so they can be fixed quickly.
* Automated module tests are more controversial: some of us are keen, while one believes they have a negative value. The upside is that they spot side-effects of programming changes, as when a 1-line change breaks a function. The downside is that they can make code brittle, and code refactoring takes longer because of patching the tests. Also module tests do not deal well with the wierd intermittent exceptions that you get when system services are stressed - the best approach is to make a guess and simulate them by "mocking", then extend the module tests when errors show up in production.

The final improvement was in the agility of the development and release process. Less introduced bugs were making it out to the field, so we could start to develop and deploy faster, with less risk of causing problems for clients or our support teams.


Conclusions

Implementing automated test and continuous integration:

* both pay back within a year, considering direct savings only.
* together they give significant synergy - the savings are "more than the sum of their parts"
* give many savings and benefits from outside of the test function, with significant benefits for development, support, operations and clients.
* give greatly increased agility of the development and deployment process. This allows new developments to be completed faster and with less business risk.

Overall, implementation of CI and automated test was a major success in 2008. We released software approximately 3 times during 2008, with quality and productivit steadily improving. The next steps were to speed up the development process to monthly iterations. I'll describe this in a later post.

0 comments:

Post a Comment