Code Coverage Is Like a Bloom Filter

Bloom filters are a really interesting data structure. Conceptually, they’re similar to a set, since their primary operation is membership testing. When compared with a conventional set, bloom filters use far less space in memory, but there’s a tradeoff. Bloom filters are a probabilistic data structure, meaning they don’t give you a precise answer. With a conventional set, the answer to the question “is this thing a member of the set?” is a decisive “yes” or “no”. But with a bloom filter, the answer is either “no” or “maybe”.

I had a realization recently that bloom filters are also the perfect metaphor for thinking about code coverage. By code coverage, I mean tools that run your test suite (you have tests, right?) and report which lines are and aren’t executed during those tests.

Code coverage, as a metric, is somewhat fetishized in the developer community. Open source projects proudly display their coverage stats in READMEs, and it’s not uncommon for teams to designate a target percentage for code coverage. We assume that the higher that number is, the better tested the software is; this assumption is wrong.

Code coverage only measures which lines are executed, and that means it’s hilariously easy to game— just don’t write any asserts. Even with the best of intentions, it’s all too easy to cover every line while still failing to test important edge cases.

Getting high coverage is easy; it’s writing good tests that’s hard.

The practice of coverage targets in particular strikes me as a bad idea, and not just from the implications of Goodhart's law. Whatever number you might pick, it’ll be totally arbitrary and hard to justify (“what percentage of the app are we okay with not working?”). So, 100% coverage, then? Assuming that’s even possible (there’s almost always at least a little untestable code), chasing 100% coverage will spawn a lot of low-value tests that create an unnecessary maintenance burden, and you’ll still have the same problem. Just because you have lots of tests does not mean those tests are any good.

Now, don’t let this give you the impression that code coverage is useless, because it isn’t— it’s just not a useful tool for evaluating the quality of your tests. There are tools that can do this, like mutation testing, but code coverage is not one of them. Where code coverage shines is in identifying and patching holes in your test suite. While untested code isn’t always a problem, it usually is, and having a tool to easily locate these problem areas is extraordinarily valuable, especially when inheriting a system with spotty testing.

Just like a bloom filter, if the purpose of code coverage is to answer the question “is this code well-tested?”, then the answer is either “no” or “maybe”.