work-blog/articles/published/The-Various-Uses-Of-Vegetable-Condiments-In-Testing.md
Gregory Gauthier da44ea30f1 refactor(structure): reorganize articles and assets directories
Move drafts to articles/drafts, articles to articles/published, and assets to general and memes subdirs.
2026-04-07 15:18:51 +01:00

18 KiB
Raw Blame History

In my time as a software tester and automation engineer, Ive seen all manner of tragic misuses of engineering methodologies and their associated toolsets. Agile itself is one I mentioned in my previous post. Second on that list is easily the Cucumber (aka Gherkin) toolset.

Ive seen it used as a unit testing framework in lieu of the supplied library of the target language, as plain text documentation for a user application, as an API client for a backend application, and as a management report generator for engineering managers. As bad as all those things might sound, the worst misuse of the tool, I think, is its employment as a plain-english procedural DSL for end-to-end tests. Nothing raises my hackles more, than to see long strings of “and then when” chained together in a meandering play-by-play narrative of a testers stream of consciousness. Thats not what Cucumber is for.

Lets answer the obvious question that arises from my last complaint: What, exactly, is Cucumber for? If you read the original creators blog post about BDD from September of 2006, a few things pop out. First, it was a response to several problems Dan North saw in the attempt to implement TDD in the early days. Second, testing problems werent ultimately the problems he was trying to solve. In other words, the problems he encountered with TDD stemmed from a more fundamental problem with the way software was being designed and built. The problem with TDD was was that it was downstream from where the real problem lie. In short: it was still just one more attempt to mitigate problems with overall software quality by tinkering with the way tests were written.

What he invented to deal with the deeper problem was something he coined as Behaviour Driven Development. It is meant to complement the Agile story pattern, by providing a way to describe the behaviour of an application when a user engages in a discrete interaction with the application. It is fundamentally, a design methodology, not a testing tool. To implement the methodology, Dan North constructed a language that could be used to describe a behavioural design specification. That language is “Gherkin”, and the libraries that interpret it, are known as “Cucumber”.

Cucumber, therefore, is for improving the quality of your software product by focusing on the way you design the behaviour its human-facing interfaces. It is not a testing tool, it is a design tool that just happens to include the ability to execute the design requirements. That design tool is meant to facilitate the Behaviour Driven Development methodology. However, as the Cucumber site itself rightly points out, you may still get some value out of Cucumber as a testing tool, without necessarily doing BDD:

Just because you're using Cucumber, doesn't mean you're doing BDD. There's much more to BDD than using Cucumber.

But I think its also the case that getting value out of Cucumber as a testing tool requires understanding how its meant to be used in the BDD methodology. In fact, Id go so far to say that if you dont understand, then youre actually going to make your situation worse, by using the tool. So, if you want to use it, heres where to begin:

What Is A Behaviour?

What does it mean to design an application by describing its behaviour? Applications do not “behave” in the sense that a living thing behaves. Living things are self-motivated. They will act spontaneously, even when driven by biological urges. Nothing necessarily must happen to them, for them to decide to move this way or that, go to sleep or wake up, scratch your couch up, or chew your slippers.

Software applications, on the other hand, dont do any of this. They will sit idly, until the battery on your laptop dies, quietly waiting for you to do something to them. If the action you take is something the software is designed to notice, it will offer up the defined response to your action. In a word, a behaviour is a discrete, single cause-and-effect event.

For example, if you have a fallow balloon, when you blow air into a balloon, the balloon expands. Notice the elements of that example: (a) a balloon that is in a certain condition, (b) a catalysing event, and (c) a balloon in a new condition after the event. Behaviour then, in the context of BDD, is the description of a cause-and-effect relationship between a Software application and a user, at the point of an interaction.

Another way to put this, is to call it a described “state transition”. In fact, Bob Martin has famously explained how Cucumber is not a plain-English procedural DSL, but a plain-English finite state machine specification:

If the Gherkin requirements are complete, then they describe the complete state machine of the system, and the complete test suite for the system.

And what does that look like, to Uncle Bob? Like this:

GIVEN that we are in state S1 WHEN we recieve event E1 THEN we transition to state S2

How Do I Get To The Museum?

Anyone who has worked in test automation over the last 15 years will instantly recognise the familiar “Given, When, Then” keywords in that transition example. What they wont recognise, is the disciplined approach to the use of those terms. What Martin outlines in his example is a discrete transition event. A system in state S1. A catalysing event E1. A system in state S2. That is not at all typical of the way Cucumber (aka “Gherkin”) has actually been deployed in the field. The broader reasons for this will be examined later, but the immediate reason is because everyone who approaches this tool, approaches it the same way they approach singing.

When someone asks you “do you know how to sing?” or “can you sing row, row, row your boat?”, you will answer yes. You will do this because as everybody knows, if called to do so, its just a matter of opening your mouth and blowing some air past your vocal chords. But, as we all also know, not everyone who “can sing”, gets a position at La Scala.

Similarly, you would have no problem telling me what the words “given”, “when”, and “then” mean. They are simple, very common words in the English language, and they are used all the time to speak about things that come one after the other. So, of course, you will tell me you know how to use them in a Cucumber scenario. Doesnt everyone?

The following screenshot is typical in the testing field, I have found, of what those sorts of assumptions produce:

This is obviously not a description of a discrete state transition. What the author of this scenario has done, is to describe his personal journey through the application to some terminal end point. In effect, the author has given you a procedure for how to get to the museum: “Head straight down Main, turn left at the Tesco, watch out for the big sign on your left, keep left, then head south to the round-about and take the third exit, and then just before the gas station make a hard-right then go up the ramp….”, and so on.

Sure, you might say, but is this really a problem? After all, it describes a complete user journey, and user journeys are the ultimate test of the usability of the application.

This argument fundamentally misunderstands the nature of the tool. It is excusable, because (as Ive already outlined), the interface is plain English, and we are already predisposed to think in linear terms because thats how we live our lives: “…and then I took a shower, and then I brushed my teeth, and then I got dressed…”. It is true that there are (and should be) procedures involved in the implementation of the executable instance of a proper Gherkin spec. But those procedures are to be found in the so-called “glue” underlying the Gherkin. In other words, in the programming language bound to your Gherkin specs through your Cucumber library (whether Java, or Python, or Ruby, or whatever).

The Cost Of Discipline

There is another reason why not knowing how to use Gherkin well is somewhat excusable. The notion of a finite state machine is a concept that comes to us from academic mathematics and computer science. Many of my colleagues over the years, while competent in their own right, were not academically trained computer scientists (indeed, neither am I). If you look the concept up on the internet, the explanations youll find will not look even remotely similar to what is found in the Gherkin syntax. So, really, Kudos to Bob Martin for recognising it back in 2018.

Once recognised, though, we would do well to apply the concept carefully and consistently to Cucumber. This dispute is more than just an abstract debate about the placement of BDD in an academic theory of engineering practices. As any trades contractor can tell you, the misuse of tools can be incredibly costly (sometimes even deadly). And there are many reasons to think this of Cucumber.

For starters, linear stream-of-consciousness instruction lists shown in the screenshot above make it next to impossible to normalise your underlying code. Every new scenario “journey” is a whole new lexicon of words and phrases that need to be married to some executable script beneath the Gherkin. Whereas, with a well crafted state-machine approach to scenarios, the potential for reusable code increases dramatically. Consider the following example:

Scenario: The User Views His Profile 
  Given I launch the application 
   When I enter my userid 
    And I enter my password 
    And I click login 
    When the application loads 
    And I click on my profile icon 
    Then the application show me my profile 
    And I can see my details 

Scenario: The User Updates His Profile 
  Given the application is already running 
    And I have successfully logged in 
   When I can see the dashboard 
   Then I can click on my profile icon 
    And I can see my details 
   Then I edit my display name 
    And I click save

This example is a typical representation of what can be found in the field. Note immediately, two things: First, the “view” scenario has more steps than the “edit” scenario. Second, the application launch and login processes are rendered in completely different language here. Third, there are further subtle differences in the “click on my profile” steps in each. As well as other minor differences.

Each one of these differences is going to have underlying knock-on effects. Each line of the scenario is a “hook” to a piece of code that executes in the background. But because the language is so different between the scenarios, each one will need its own unit of code to execute what is essentially the same function. If your testers are writing their own execution code, then this will cause them to spend more time writing code, than testing. If your developers are writing the execution code, then youre robbing them of an opportunity to optimise the test suite.

Compare the above to this:

Background: Prepare the application 
  Given The application is running 
    And A standard user is logged in 

Scenario: The user views his profile 
  Given I am at the dashboard 
   When I navigate to my profile page 
   Then I can see my user details 

Scenario: The user edits his profile display name 
  Given I am at the dashboard 
   When I navigate to my profile page 
    And I change my display name to "Yosemite Sam" 
   Then I can see "Yosemite Sam" on my profile page

In this example, the scenarios are supported by a background stage that rolls the login into a single block. The background clause wont be ideal in all cases, but for many scenarios, this is a good technique for eliminating repetition in your Gherkin, making it much easier to maintain. Whats more, the reduction in repetition means reducing the likelihood of dialectic proliferations of the same functionality (“the application is running”, “the application is loaded”, “the application has started”, etc).

In addition to the background stage, notice also that each step in the scenarios is worded identically: “I am at the dashboard”, “I navigate to my profile page”, and so forth. These usages insure that there need be only one piece of glue code for each of these steps, in any scenario where they are used further reducing the maintenance issues.

Finally, and most importantly, notice that the scenarios are structured state transitions. There is a well-defined starting condition, a discrete set of catalysing events, and a clearly defined expected end state. Regardless of how the glue code makes each of these steps happen, these test scenarios make it extremely clear what our expectations are for each starting condition couple to each catalysing event.

One implicit lesson we can take from this, is that a scenario is not a user journey. A scenario is a segment of a user journey that we believe to be important enough to test. It is a state transition from one set of assumed conditions up to the point in the user journey where we are, to another set of outcome conditions expected after a user interaction: “I navigate to my profile page”, or “I change my display name to something”.

This is just one instance of how Gherkin scenarios like these could be improved. Much more could be said (for example, who am “I”, in these scenarios?). However, theres simply not enough room to go on. One could write an entire book on Cucumber best practices like this (and it shocks me that nobody has yet). But, suffice to say here, the tighter and more terse your Gherkin, the better your behavioural specification will be and, the less likely you are to throw the entire effort out as a complete waste of company resources (something I have seen happen numerous times).

All The Wrong Reasons

One of the side-effects of the proliferation of Cucumber without the BDD methodology, is that other purposes for Cucumber began to proliferate as well. I mentioned some of them at the beginning. But there is one purpose that has turned Cucumber into a weapon opposed to its own creators original end goal.

In many shops, Cucumber was promoted by siloed test managers who had convinced themselves that Cucumber was a means by which they could get automated tests without needing code competent testers.

In such shops, testers wrote all the scenarios without understanding how the code that executed them worked, and entry level developers wrote all the glue and harness code, without understanding what the cucumber scenarios were actually for. In every case in my experience, this has resulted in behemoth testing projects that have little or nothing at all to do with the applications under test. The daily goal of testers is to constantly nag junior developers about testing dashboards that have turned red for one reason or another. Entire internal organisations devoted to nothing but making the machine that turns green, stay green. Ultimately, costing the institution orders of magnitude more money than if theyd just had a few small cross-discipline teams working on the application together, without any automation.

Conclusion: A Solution In Search of A Problem

To re-quote the Cucumber site:

Many people use Cucumber to do test automation, to check for bugs after the code has already been implemented. That's a perfectly reasonable way to do test automation, but it's not BDD.

I would take this a step further. If youre not “doing BDD”, I would highly recommend you not to use Cucumber or the Gherkin language. Contra the Cucumber assertion, the experience in the field, is that writing Gherkin after-the-fact has been a near disaster, relative to the goals that methodologies like Agile and BDD set out.

Adding the additional layer of a plain-language DSL, even just as (perhaps especially as) a test automation tool opens up a near impossible to resist landscape of temptations to bad design and development habits and even worse organisational choices.

If your shop is not working well, exploring new processes and methodologies can be an opportunity for significant improvement. In such a situation, BDD might just be the cure youre looking for. If it is, then a disciplined approach to your Gherkin specifications will also naturally follow as part of the BDD process.

But if your shop is working well, then introducing disruptive new practices can completely derail your team. The half-way step of adopting the tools of that methodology without the methodology itself, may seem like a “perfectly reasonable way to do test automation” for some. But from my experience, has been nearly as chaotic as introducing the entire methodology to an already mature and stable shop.

If you have competent developers, they will know how to write tests. If you have a healthy, functional development process, then those tests will naturally align with design and requirement goals.

If you have competent test engineers, they will know how to properly test your applications, and if you have a healthy functional development process, those testers will contribute to design and requirements early on, to insure testability.

If you find that neither of these things is true, then Cucumber on its own is probably not going to help you.