work-blog/articles/published/The-Various-Uses-Of-Vegetable-Condiments-In-Testing.md
Gregory Gauthier 0589ae9d08 docs(metadata): standardize topics to controlled vocabulary
- Restrict topics in CLAUDE.md to: philosophy, craft, epistemology, exploratory-testing, agile
- Update GROK.md YAML examples for title quotes and related file extensions
- Adjust topics in published articles to align with controlled list, removing deprecated terms like bdd, automation, reasoning, formal-logic, resources
2026-04-07 16:59:43 +01:00

150 lines
18 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: "The Various Uses Of Vegetable Condiments In Testing"
date: 2025-06-02
topics: [agile, craft]
related:
- Agile-Or-Whatever-You-Call-It.md
abstract: >
Cucumber is a design tool for behaviour-driven development, not a
testing framework — and misusing it as one makes your situation worse,
not better.
---
In my time as a software tester and automation engineer, Ive seen all manner of tragic misuses of engineering methodologies and their associated toolsets. Agile itself is one I mentioned in my previous post. Second on that list is easily the Cucumber (aka Gherkin) toolset.
Ive seen it used as a unit testing framework in lieu of the supplied library of the target language, as plain text documentation for a user application, as an API client for a backend application, and as a management report generator for engineering managers. As bad as all those things might sound, the worst misuse of the tool, I think, is its employment as a plain-english procedural DSL for end-to-end tests. Nothing raises my hackles more, than to see long strings of “and then when” chained together in a meandering play-by-play narrative of a testers stream of consciousness. Thats not what Cucumber is *for*.
Lets answer the obvious question that arises from my last complaint: What, exactly, *is* Cucumber for? If you read the [original creators blog post about BDD from September of 2006](https://dannorth.net/introducing-bdd/ "https://dannorth.net/introducing-bdd/"), a few things pop out. First, it was a *response to* several problems Dan North saw in the attempt to implement TDD in the early days. Second, *testing problems* werent ultimately the problems he was trying to solve. In other words, the problems he encountered with TDD stemmed from a more fundamental problem with the way software was being *designed and built*. The problem with TDD was was that it was downstream from where the real problem lie. In short: it was *still* just one more attempt to mitigate problems with overall software quality by tinkering with the way tests were written.
What he invented to deal with the deeper problem was something he coined as *Behaviour Driven Development*. It is meant to complement the Agile story pattern, by providing a way to *describe* the *behaviour* of an application when a user engages in a discrete interaction with the application. It is fundamentally, a *design* methodology, not a testing tool. To implement the methodology, Dan North constructed a language that could be used to *describe a behavioural design specification*. That language is “Gherkin”, and the libraries that interpret it, are known as “Cucumber”.
Cucumber, therefore, is *for* improving the quality of your software product by focusing on the way you *design the behaviour its human-facing interfaces*. It is not a testing tool, it is a *design tool* that just happens to include the ability to *execute the design requirements*. That design tool is meant to facilitate the [*Behaviour Driven Development* methodology](https://cucumber.io/docs/bdd/ "https://cucumber.io/docs/bdd/"). However, as the Cucumber site itself rightly points out, you may still get some value out of Cucumber as a testing tool, without necessarily *doing BDD*:
> *Just because you're using Cucumber, doesn't mean you're doing BDD.* [*There's much more to BDD than using Cucumber*](https://cucumber.io/docs/bdd "https://cucumber.io/docs/bdd")*.*
But I think its also the case that getting value out of Cucumber as a testing tool requires understanding how its meant to be used in the BDD methodology. In fact, Id go so far to say that if you dont understand, then youre actually going to *make your situation worse*, by using the tool. So, if you want to use it, heres where to begin:
## What Is A Behaviour?
What does it mean to design an application by describing its behaviour? Applications do not “behave” in the sense that a living thing behaves. Living things are self-motivated. They will act spontaneously, even when driven by biological urges. Nothing necessarily must *happen to them*, for them to decide to move this way or that, go to sleep or wake up, scratch your couch up, or chew your slippers.
Software applications, on the other hand, dont do any of this. They will sit idly, until the battery on your laptop dies, quietly waiting for you *to do something to them*. If the action you take is something the software is designed to notice, it will offer up the defined response to your action. In a word, a **behaviour** is a discrete, single *cause-and-effect* event.
For example, if you have a fallow balloon, when you blow air into a balloon, the balloon expands. Notice the elements of that example: (a) a balloon that is in a certain condition, (b) a *catalysing event*, and (c) a balloon in a new condition after the event. Behaviour then, in the context of BDD, is the description of a cause-and-effect relationship between a Software application and a user, at the point of an interaction.
Another way to put this, is to call it a described “[state transition](https://www.sciencedirect.com/topics/computer-science/state-transition "https://www.sciencedirect.com/topics/computer-science/state-transition")”. In fact, [Bob Martin has famously explained](https://blog.cleancoder.com/uncle-bob/2018/06/06/PickledState.html "https://blog.cleancoder.com/uncle-bob/2018/06/06/PickledState.html") how Cucumber is not a plain-English *procedural* *DSL*, but a plain-English [*finite state machine*](https://brilliant.org/wiki/finite-state-machines/ "https://brilliant.org/wiki/finite-state-machines/") specification:
> *If the Gherkin requirements are complete, then they describe the complete state machine of the system, and the complete test suite for the system.*
And what does that look like, to Uncle Bob? Like this:
> `GIVEN that we are in state S1 WHEN we recieve event E1 THEN we transition to state S2`
## How Do I Get To The Museum?
Anyone who has worked in test automation over the last 15 years will instantly recognise the familiar “Given, When, Then” keywords in that transition example. What they wont recognise, is the disciplined approach to the use of those terms. What Martin outlines in his example is a *discrete transition event*. A system in state S1. A catalysing event E1. A system in state S2. That is not at all typical of the way Cucumber (aka “Gherkin”) has actually been deployed in the field. The broader reasons for this will be examined later, but the immediate reason is because everyone who approaches this tool, approaches it the same way they approach singing.
When someone asks you “*do you know how to sing?*” or “*can you sing row, row, row your boat?*”, you will answer yes. You will do this because *as everybody knows*, if called to do so, its just a matter of opening your mouth and blowing some air past your vocal chords. But, as we all **also** know, not everyone who “can sing”, gets a position at [La Scala](https://www.teatroallascala.org/en/index.html "https://www.teatroallascala.org/en/index.html").
Similarly, you would have no problem telling me what the words “given”, “when”, and “then” mean. They are simple, very common words in the English language, and they are used all the time to speak about things that come one after the other. So, of course, you will tell me you know how to use them in a Cucumber scenario. *Doesnt everyone*?
The following screenshot is typical in the testing field, I have found, of what those sorts of assumptions produce:
<img title="" src="file:///Users/gregory.gauthier/Documents/BlogPosts/29c72b97-5bd3-4b15-bc41-037c6f10a944.png" alt="" width="859" data-align="center">
This is obviously not a description of a discrete state transition. What the author of this scenario has done, is to describe his personal journey through the application to some terminal end point. In effect, the author has given you a ***procedure*** for how to get to the museum: “*Head straight down Main, turn left at the Tesco, watch out for the big sign on your left, keep left, then head south to the round-about and take the third exit, and then just before the gas station make a hard-right then go up the ramp….*”, and so on.
Sure, you might say, but is this really a problem? After all, it describes a complete user journey, and user journeys are the ultimate test of the usability of the application.
This argument fundamentally misunderstands the nature of the tool. It is excusable, because (as Ive already outlined), the interface is plain English, and we are already predisposed to think in linear terms because thats how we live our lives: “…*and then I took a shower, and then I brushed my teeth, and then I got dressed*…”. It is true that there are (and should be) procedures involved in the implementation of the *executable instance* of a proper Gherkin spec. But those procedures are to be found in the so-called “glue” underlying the Gherkin. In other words, in the programming language bound to your Gherkin specs through your Cucumber library (whether Java, or Python, or Ruby, or whatever).
## The Cost Of Discipline
There is another reason why not knowing how to use Gherkin well is somewhat excusable. The notion of a finite state machine is a concept that comes to us from academic mathematics and computer science. Many of my colleagues over the years, while competent in their own right, were not academically trained computer scientists (indeed, neither am I). If you look the concept up on the internet, the explanations youll find will not look even remotely similar to what is found in the Gherkin syntax. So, really, Kudos to Bob Martin for recognising it back in 2018.
Once recognised, though, we would do well to apply the concept carefully and consistently to Cucumber. This dispute is more than just an abstract debate about the placement of BDD in an academic theory of engineering practices. As any trades contractor can tell you, the misuse of tools can be incredibly costly (sometimes even deadly). And there are many reasons to think this of Cucumber.
For starters, linear stream-of-consciousness instruction lists shown in the screenshot above make it next to impossible to *normalise your underlying code.* Every new scenario “journey” is a whole new lexicon of words and phrases that need to be married to some executable script beneath the Gherkin. Whereas, with a well crafted state-machine approach to scenarios, the potential for *reusable code* increases dramatically. Consider the following example:
```
Scenario: The User Views His Profile
Given I launch the application
When I enter my userid
And I enter my password
And I click login
When the application loads
And I click on my profile icon
Then the application show me my profile
And I can see my details
Scenario: The User Updates His Profile
Given the application is already running
And I have successfully logged in
When I can see the dashboard
Then I can click on my profile icon
And I can see my details
Then I edit my display name
And I click save
```
This example is a typical representation of what can be found in the field. Note immediately, two things: First, the “view” scenario has more steps than the “edit” scenario. Second, the application launch and login processes are rendered in completely different language here. Third, there are further subtle differences in the “click on my profile” steps in each. As well as other minor differences.
Each one of these differences is going to have underlying knock-on effects. Each line of the scenario is a “hook” to a piece of code that executes in the background. But because the language is so different between the scenarios, each one will need its own unit of code to execute what is essentially the same function. If your testers are writing their own execution code, then this will cause them to spend more time writing code, than testing. If your developers are writing the execution code, then youre robbing them of an opportunity to optimise the test suite.
Compare the above to this:
```
Background: Prepare the application
Given The application is running
And A standard user is logged in
Scenario: The user views his profile
Given I am at the dashboard
When I navigate to my profile page
Then I can see my user details
Scenario: The user edits his profile display name
Given I am at the dashboard
When I navigate to my profile page
And I change my display name to "Yosemite Sam"
Then I can see "Yosemite Sam" on my profile page
```
In this example, the scenarios are supported by a background stage that rolls the login into a single block. The background clause wont be ideal in all cases, but for many scenarios, this is a good technique for eliminating repetition in your Gherkin, making it much easier to maintain. Whats more, the reduction in repetition means reducing the likelihood of dialectic proliferations of the same functionality (“the application is running”, “the application is loaded”, “the application has started”, etc).
In addition to the background stage, notice also that each step in the scenarios is worded identically: “I am at the dashboard”, “I navigate to my profile page”, and so forth. These usages insure that there need be *only one* piece of glue code for each of these steps, in any scenario where they are used further reducing the maintenance issues.
Finally, and most importantly, notice that the scenarios are *structured state transitions*. There is a well-defined starting condition, a discrete set of catalysing events, and a clearly defined *expected end state*. Regardless of **how** the glue code makes each of these steps happen, these test scenarios make it extremely clear what our expectations are for each starting condition couple to each catalysing event.
One implicit lesson we can take from this, is that *a scenario is not a user journey*. A scenario is a segment of a user journey that we believe to be important enough to test. It is a state transition from one set of assumed conditions up to the point in the user journey where we are, to another set of outcome conditions expected after a user interaction: “I navigate to my profile page”, or “I change my display name to something”.
This is just one instance of how Gherkin scenarios like these could be improved. Much more could be said (for example, who am “I”, in these scenarios?). However, theres simply not enough room to go on. One could write an *entire book* on Cucumber best practices like this (and it shocks me that nobody has yet). But, suffice to say here, the tighter and more terse your Gherkin, the better your behavioural specification will be and, the less likely you are to throw the entire effort out as a complete waste of company resources (something I have seen happen numerous times).
## All The Wrong Reasons
One of the side-effects of the proliferation of Cucumber without the BDD methodology, is that other purposes for Cucumber began to proliferate as well. I mentioned some of them at the beginning. But there is one purpose that has turned Cucumber into a weapon opposed to its own creators original end goal.
In many shops, Cucumber was promoted by siloed test managers who had convinced themselves that Cucumber was a means by which they could get automated tests without needing code competent testers.
In such shops, testers wrote all the scenarios without understanding how the code that executed them worked, and entry level developers wrote all the glue and harness code, without understanding what the cucumber scenarios were actually *for*. In every case in my experience, this has resulted in behemoth testing projects that have little or nothing at all to do with the applications under test. The daily goal of testers is to constantly nag junior developers about testing dashboards that have turned red for one reason or another. Entire internal organisations devoted to nothing but making the machine that turns green, stay green. Ultimately, costing the institution orders of magnitude more money than if theyd just had a few small cross-discipline teams working on the application together, without any automation.
## Conclusion: A Solution In Search of A Problem
To re-quote the Cucumber site:
> *Many people use Cucumber to do test automation, to check for bugs after the code has already been implemented. That's a perfectly reasonable way to do test automation, but it's not BDD.*
I would take this a step further. If youre not “doing BDD”, I would highly recommend you *not* to use Cucumber or the Gherkin language. Contra the Cucumber assertion, the experience in the field, is that writing Gherkin after-the-fact has been a near disaster, relative to the goals that methodologies like Agile and BDD set out.
Adding the additional layer of a plain-language DSL, even just as (perhaps especially as) a test automation tool opens up a near impossible to resist landscape of temptations to bad design and development habits and even worse organisational choices.
If your shop is not working well, exploring new processes and methodologies can be an opportunity for significant improvement. In such a situation, BDD might just be the cure youre looking for. If it is, then a disciplined approach to your Gherkin specifications will also naturally follow as part of the BDD process.
But if your shop *is* working well, then introducing disruptive new practices can completely derail your team. The half-way step of adopting the tools of that methodology without the methodology itself, may seem like a “perfectly reasonable way to do test automation” for some. But from my experience, has been nearly as chaotic as introducing the entire methodology to an already mature and stable shop.
If you have competent developers, they will know how to write tests. If you have a healthy, functional development process, then those tests will naturally align with design and requirement goals.
If you have competent test engineers, they will know how to properly test your applications, and if you have a healthy functional development process, those testers will contribute to design and requirements early on, to insure testability.
If you find that neither of these things is true, then Cucumber on its own is probably not going to help you.