In Fooled By Testing I looked at some general concerns I've had about testing and what we can expect from it. In this part I'd like expose myself to the Internet at large and let you in on:

With no further ado, here are the past ~5 bugs I've let slip into production:

Production Bug #1
One of the joys of 'crowd-sourcing' some aspects of our application is that we regularly need to merge the symptoms that our user's report due to mispellings. When we do this, we automatically email the affected users so they know what's up. Normally this is merging 'tramors' into 'tremors' with 2 people reporting 'tramors' and 1500 reporting 'tremors'. It recently came up that we had 'headache' and 'headaches' as separate symptoms and each were reported by thousands of users. The task fell to engineering that we wanted to be able to do this merge without sending thousands of emails to users. I spent a good while writing pretty good tests that email didn't get sent and upgrading the merge tests in general and I felt pretty confident in my code. We got to QA and we realized that doing a merge like this was going to take 10+ minutes so we decided to move on and check it later.

Bottom line: Not once did we check to see if the site worked after we merged headache into headaches, the very functionality we were attempting to achieve.

Good News: The code I wrote worked. The merge happened successfully and no email was sent.
Bad News: The second the merge completed, all hell broke loose. You see, headaches is a primary symptom in our mood community and, long story and a bit of meta-programming magic later it turns out that primary symptoms get methods like 'has_headache?' created for them. The merge blew this symptom away thence there was no method and by this declension we proceed into the madness whereupon Hoptoad lit up like a christmas tree.

Now you are perfectly justified in claiming that only an idiot would let this happen, but I declare that I am often an idiot. Furthermore I do solemnly swear that I would have picked up on this without TDD. The great joy and efficiency of TDD, that I can exercise code through tests, is a A+ way to avoid rigorously seeing if the site actually works.

Production Bug #2
Added a css file that broke some style things in a different part of the site.
I grant you, this isn't an example of the evils of testing, but I will say that from a user perspective CSS-fail can look an awful lot like site fail. writing a test case to prevent this would be a monster task with poor ROI. If you a page to render in IE, you'd best load the page in IE.

Production Bug #3
Nobody merged the old production branch, so we regressed (tests existed... but they we're part of what didn't get merged).
Again, not testings fault, but what we're looking at here is 'Things that break production' and whether tests helped and in this case they did not. Sometime there is no substitute for manual labor.

Production Bug #4
Cannot Update Frozen Hash. This was some squirly ActiveRecord nuissance. The kind of thing that cropped up in Hibernate every 3 seconds bit which ActiveRecord generally seems robust to. It's in the middle of a gross controller that no one is proud of. A coworker couldn't reproduce this bug using the site, but 'fixed' this bug by writing a functional test that produced the same exception. He then fixed that, but the bug was still there. Eventually we found a way to exercise it using the app and then exorcised it, but I would argue that we got a false sense of 'fixedness' from this test.

Production Bug #5
Performance Disaster when searching for multiple treatments.
I know I know, nobody ever said unit testing was performance testing and technically this got stopped just short of production, but it got darn close and it was another episode of me writing my tests and thinking all was well and not testing the app as thoroughly as I should.

Production Bug #6
Tested: User.for_disease(disease), but actual form submits looked like User.for_disease([disease]). Turns out that array was bad news and could have had a major affect in production though thankfully no one actually used the feature involved.

Bottom line: It is astoundingly easy to write tests that seem to exercise the full functionality of the code, but for which subtle differences in initial conditions have catastrophic effects. (See Atmospheric Disturbances for more info on the profound psychological effects of perturbations in initial conditions :)


So is testing dangerous?
Well no, obviously not, but also yes a little.

Let's take a little break from development work and ask a similar question:
"Were the past 20 years of sharpe's ratio's and Case/Schiller financial analytics dangerous?"
Of course not. Before these tools existed, financial holdings were opaque and this was indeed dangerous. Once again however, as we've all learned, it was dangerous. Banks blew up in the 60's and 70's but knowledge of imperfect information breeds caution. Knowing 50% more but have 90% more confidence is dangerous in the extreme. A 99% chance the site works means 3.6 days/year of the site not working.

Obviously most of our rails apps are not 'too big to fail' (besides Twitter) but as confidence rises and paranoia decreases the chance of a killer bug grows sharply. If our profession is to avoid a backlash at the reliability of online applications, we can not simply throw all our eggs in the automated testing basket.

Let's say it 3 times together:

Unit-Testing doesn't find bugs
Unit-Testing doesn't find bugs
Unit-Testing doesn't find bugs


Bottom Line: Saying 'we can trust this, it's been unit tested' is a lot like saying 'what could go wrong? That debt is insured by AIG'