DRY Rot

Every software developer is taught the DRY Principle: "Don't Repeat Yourself". However, I've noticed a troubling trend when it comes to DRY's application, that I've increasingly had to call out and mitigate on teams. Ironically, I've gotten a bit tired of repeating myself.

DRY Rot
Photo by Mike Erskine / Unsplash

Every software developer is taught or at least intuits the DRY Principle: "Don't Repeat Yourself". Practically ubiquitous with "learning to program" is learning early-on how one can extract repeated functionality into shared code, oftentimes as an invokable boundary, such as a function, a shared library, or a network call. And so when a programmer first learns about the DRY Principle, it sort of just "clicks".

However, I've noticed a troubling trend when it comes to DRY's application, that I've increasingly had to call out and mitigate on teams. Ironically, I've gotten a bit tired of repeating myself, and besides that writing helps organize my own thoughts (the primary purpose of this blog), and so these ramblings will scratch both of these itches.

What I am noticing is DRY overzealously – dare I say erroneously, applied. This is one of the most pernicious tendencies in software developers that trends a software system towards coupled, garbled, spaghetti that is near impossible to reason about and subsequently difficult to iterate on and change.

The core operating word in "Software" is "soft". A software system's core value is that its code can be modified and changed to align with desired behavior.

That ability is so integral to the concept of "Software", that we even use an entirely different name for code whose core value is resistance to change – we call this "Firmware".

This is to say that if your software system's design subsequently causes it to be resistant to change in areas where it needs to be or likely will need to be changed, then you've made a fundamental misstep in the design of your software system.

Unfortunately, the DRY mnemonic has become reified such that it is commonly mistaken for the principle itself – that any duplication of code is a cardinal sin that must be avoided, at all cost. This is simply wrong. The DRY Principle is getting at something more fundamental and worthwhile, but that thing isn't purely "code duplication bad". Indeed, the essence of the DRY Principle seems increasingly lost and forgotten by folks that espouse, practically as religious dogma, "Don't Repeat Yourself". More on the actual principle later but before that, I'd like to spend a bit of time discussing what I believe contributed to this intuitive principle's devolution into doctrine.

class HeatedDebate extends BikeshedController implements Unresolvable

Software Developers love arguing about small details that ultimately don't matter:

  • Tabs vs. Spaces (and then further still with 2 spaces vs. 4 spaces)
  • Semi-colons vs no semi-colons
  • This editor vs. that editor
  • This terminal vs. that terminal
  • Windows vs. Mac vs. Linux (and then further still with Debian vs. Fedora vs. Arch vs. yada yada yada)
  • More recently, Claude vs. Codex vs. Cursor vs. Amp vs. OpenCode vs. yada yada yada

This tendency to argue over insignificant details is so ubiquitous within our craft, we've dedicated a name to the phenomena: bikeshedding. DRY scratches this itch to bikeshed; it encourages developers to argue over insignificant details, often a distraction from more important problems, and at its worst creating more unnecessary problems.

Secondly, Software Developers love coming up with names for things. In other words, they love coming up with taxonomy. For this, I present a rambling:

A team of developers building a simple app to track office snacks spends six weeks deciding what to call the button that adds a snack.

"Add Snack" is too pedestrian. Someone proposes IngestEdibleEntity. Another insists it should be ProvisionConsumableResource to align with their "resource-oriented architecture." A third argues that snacks aren't strictly resources, they're "perishable assets" and lobbies for RegisterPerishableAsset.

This spawns a 47-message thread debating whether a snack is an Asset, a Provision, or an Edible. They create an interface called IConsumable, then an abstract base class AbstractEdibleProvisionableItem, then a factory called SnackFactoryProvider, and finally a SnackFactoryProviderFactory because you obviously need a factory to make the factory.

Meanwhile, they coin a new word, "snaxonomy" to describe their classification system for sorting snacks into SavoryNonPerishableModule and SweetVolatileShelfLifeModule. A design doc titled "Towards a Unified Theory of Office Nourishment" reaches eleven pages.

The button is never built. But the naming convention wiki is magnificent.

Another rambling about taxonomy:

There are only two hard problems in Computer Science:

Naming Things
Cache Invalidation
...& off-by-1 errors

-- Leon Bambrick

Don't misunderstand, properly naming things is important, but I agree with Bambrick that naming things is the hardest problem in Computer Science. I subsequently believe that we therefore ought to be very trepidatious about giving something a name, especially if that thing can go on just fine without one. The name we coin will almost certainly be wrong – if not now, then later. This is more likely so if we name something early, when we have less information informing what the thing we're naming actually is, if it actually is anything at all; we simply know not what we do. Then names encourage even more names, and it compounds into an absolute garbled soup of nouns.

Steve Yegge wrote a fantastic blog post on this phenomena titled "Execution in the Kingdom of Nouns" that was profoundly formative for me, when I first started building software back in 2010.

This early and eager taxonomy is precisely what dogmatic adherence to the DRY mnemonic encourages.

Finally – and I believe this to be the most unfortunate coopting: DRY encourages recognizing patterns. This actually isn't bad, and is starting to get at the real principle behind DRY. The problem is in an unwavering adherence to the solution prescribed in the mnemonic.

Recognizing patterns is core to our craft. Linus Torvalds has referred to this in software as "having good taste". All of these tendencies I've described so far stem from an ability, moreover a desire, to see patterns in our software. The problem arises when the DRY mnemonic is narrowly applied as the solution anytime we recognize a pattern. "see pattern -> MUST dedupe". So many developers fall into this trap because deduplication feels like engineering – it feels productive, similar to how arranging data into rows and columns into a 3NF database schema also feels like engineering. But is isn't – at least not most of the time.

My particular example – a database's schema is in service to a use-case, including its degree of normalization.

In other words, normalized form is a mechanism to enable optimal interactions with the data w.r.t an access pattern, not a property of the data itself.

In yet more words, "more normalized != more correct"; it depends on what you need your schema to enforce and enable.

This is a classic X/Y Problem – conflating the solution to a problem with the problem itself. In this case, it is taking a heuristic that exists to serve a goal and treating it as the goal.

It's one thing to recognize a pattern, but it's another to decide what to do about it. Just because you recognize a pattern doesn't make the pattern meaningful. It's akin to seeing Jesus' silhouette in your burnt toast and concluding that it must be a message from The Almighty. Applying DRY in this way abdicates the actual discernment we must employ as software engineers when building systems.

We call it Software "Engineering" not because it requires accreditation; anyone can build amazing software. We call it "Engineering" because we are employing the "engineering method".

The engineering method is an iterative approach to solving practical problems, with incomplete data. Unlike the scientific method, which seeks to find a definitive answer to a question, the engineering method is open-ended and focuses on building, testing, and refining plausible solutions to a problem.

Implicit in the engineering method is the concession that "all models are wrong, but some models are useful."

To put it succinctly: the problem with DRY overzealously applied is that it inevitably necessitates naming things – to actively engage and pursue the hardest problem in Computer Science. This results in a lot of wasted time, wasted speculation, and wasted engineering effort to implement and maintain, and then to wrangle the subsequent technical debt that arises from trying to work around all of the poorly fit abstractions.

I'm not suggesting that we don't pursue hard problems in Computer Science, but we ought not go unduly creating more hard problems where there aren't any. The software engineer's step is not to recognize a pattern, but to ask a more important question: "Is this pattern significant – is it meaningful? Do these things merely look the same, or are they the same? And does that thing deserve a name?"

When Does Something Deserve a Name?

DRY is concerned with deduplicating meaningful decisions, not text. The overzealous developer sees duplicated code and assumes that it must be a single decision in two places, and thus needs to be unified. But identical text is not necessarily evidence of a shared decision. Two code blocks can be character-for-character identical and still encode two entirely independent decisions that merely happen to coincide in implementation today. Those code blocks ought to stay duplicated, allowing them to diverge, precisely because nothing actually binds them together except a coincidence of the present implementation.

For example, the intent of each code block could diverge. They encode decisions that look alike today, when either could change later. By unifying these code blocks, a change to one rule implicitly changes the other; there's one place where there ought to be two (and in fact there was, before DRY). A new problem is created where there previously wasn't any. In other words, the duplication was never real, the identical text was merely coincidence.

They may diverge in deployment. One code block might need to live somewhere else – say, as part of a separate service. The misstep many teams make here is to extract the shared "decision" into a library. But this only smuggles the intent coupling back in under the guise of "engineering". Two services that share a library encoding a pesudo-decision are implicitly coupled, and in practice can no longer deploy independently, which violates the entire point of drawing a service's contextual boundary.

"Well that's why you publish it under SemVer." But versioning "solves" the divergence problem only by creating a pile of new ones. Where does it get published? If the library has to be private, you've just erected a security boundary where none existed before – one that reaches all the way down into local development and potentially all the way up into infrastructure, deployment, and hosting. If you've ever fought with an NPM_TOKEN or a private artifact registry, you know exactly what I am talking about.

It also begins to blur ownership. Who decides what goes into the shared library? Who decides when a version ships? Who decides its conventions and style? "I know, let's make a shared library to codify our conventions." And on it goes, into the abyss, bikesheds as far as the eye can see. Forced unification manufactures a need for coordination that did not exist before.

And the worst part is that it's likely all in vain, because the implicitly coupled services will still end up needing to be deployed in lock-step. At this point, you haven't built a service-oriented, or a micro-service architecture. You've instead built a distributed monolith, which is the worst parts of both, combined.

Something deserves a name when it is a decision. After spotting duplication, the DRY principle is really asking "is there a single decision here that ought to exist in one place?" A genuinely DRY abstraction names a piece of knowledge: a business rule, an invariant, a policy that has one rightful owner and one reason to change. There will be times when you come across duplication, and it is indeed a single decision. And even then, you might still want to hold off on deduplicating and naming the thing until you understand a bit more about the contextual boundary. Your software can typically tolerate way more duplication than it can poorly-fit abstractions.

Coincidental duplication deserves nothing. Two blocks that look alike, but encode different decisions, are the same only in their current implementation, which will almost certainly need to diverge later.

Instead, I encourage engineering teams to follow the WET principle: "Write Everything Thrice". Besides being a jab at DRY, the WET principle helps to de-conflate "code deduplication" from what we're actually interested in deduplicating in our software: meaningful decisions. WET is not an argument for sloppiness or copy-pasta, but a recommendation for patience: let a thing appear two, three, ten times – the iterations are not what's important. Before you consider giving something a name, ask what decision that thing encodes and who owns that decision. If you can't answer cleanly, you might simply have two or three things that look alike, and ergo ought to stay separated and "duplicated". Focus on solving actual problems. Duplicate until it is painfully clear something deserves a name, and by that point you'll have much more information to inform its contextual boundary. Once you build it and name it, you'll get to delete a bunch of code too, which feels amazing, and you'll actually be devoting your engineering efforts to actual engineering.

Subscribe to TillaTheBlog

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
jamie@example.com
Subscribe