docs(articles): add frontmatter to drafts and update README

Standardize draft articles with YAML frontmatter including title, date, topics, related, and abstract. Expand README drafts section into a table listing all drafts with topics. Add "Testing Telos" to published articles.
2026-04-20 10:58:49 +01:00 · 2026-04-20 10:58:49 +01:00 · 0fc66fedcb
commit 0fc66fedcb
parent 544b773e8f
10 changed files with 119 additions and 24 deletions
--- a/README.md
+++ b/README.md
@ -17,11 +17,24 @@ A collection of articles on software testing, test engineering, and Agile method
 | [Perfection And Testing](articles/published/perfection-and-testing.md) | Why testing is about managing risk and confidence, not chasing perfection |
 | [Resources For Testers](articles/published/resources-for-testers.md) | Recommended books, blogs, and learning resources for software testers |
 | [What Makes Us Better](articles/published/what-makes-us-better.md) | Whether AI tools like Claude change the *kind* of craft, not just the speed |
+| [Testing Telos](articles/published/testing-telos.md) | Four archetypal testing shapes, each the visible expression of a distinct engineering *telos* |

 ## Drafts

- [The Uses and Abuses of Test Automation](articles/drafts/uses-and-abuses.md)
- [Agile Stories](articles/drafts/agile-stories.md)
+| Article | Topic |
+|---------|-------|
+| [The Uses and Abuses of Test Automation](articles/drafts/uses-and-abuses.md) | Whether automation expands the craft of testing or quietly becomes a different craft |
+| [Agile Stories](articles/drafts/agile-stories.md) | User stories as actors, objects, and purposes — what this reveals about Agile |
+| [When Not To Test](articles/drafts/when-not-to-test.md) | *Phronesis* about where testing effort is wasted or actively harmful |
+| [The Oracle Problem](articles/drafts/the-oracle-problem.md) | How we know what "correct" means, and how oracles conflict |
+| [Popper and the Risky Test](articles/drafts/popper-and-the-risky-test.md) | A good test as a falsifiable prediction — Popper's demarcation applied to suites |
+| [Regression as Institutional Memory](articles/drafts/regression-as-institutional-memory.md) | Test suites as codified belief; failures as either discovery or amnesia |
+| [On Flakiness](articles/drafts/on-flakiness.md) | Heraclitus and the non-deterministic test — flakiness as a signal, not noise |
+| [Tacit Knowledge and the Testing Checklist](articles/drafts/tacit-knowledge-checklist.md) | Polanyi on what experienced testers know but cannot hand to a checklist or an LLM |
+| [The Tester as Witness](articles/drafts/tester-as-witness.md) | The legal metaphor that dissolves the "tester as gatekeeper" confusion |
+| [Test Environments and Plato's Cave](articles/drafts/test-environments-and-platos-cave.md) | Dev, staging, UAT, prod as caves with their own shadows |
+| [Testing Probabilistic Systems](articles/drafts/testing-probabilistic-systems.md) | Why the pyramid breaks on ML, and what shape replaces it |
+| [When Quality Becomes Quantity](articles/drafts/when-quality-becomes-quantity.md) | Goodhart's Law and the degradation of testing metrics once they become targets |

 ## Structure

--- a/articles/drafts/on-flakiness.md
+++ b/articles/drafts/on-flakiness.md
@ -1,5 +1,14 @@
-On Flakiness — or, Heraclitus and the Non-Deterministic Test. You have direct pain here from the Appium Mac2 multi-monitor work. A flaky test is the software-testing expression of Heraclitus' river[1] — you can never step into the same test run twice. But a flaky test isn't nothing; it's a signal that one of your background assumptions about determinism is wrong. The usual move is to quarantine or delete; the philosophical move is to ask what the flakiness is telling you about your model of the system. Martin Fowler's "Eradicating Non-Determinism in Tests"[2] is the standard reference but deserves to be argued with rather than cited.
+---
+title: "On Flakiness"
+date: 2026-04-20
+topics: [philosophy, epistemology, craft]
+related: []
+abstract: >
+  A flaky test is the software-testing expression of Heraclitus' river — you can never step into the same test run twice. Rather than quarantining or deleting, flakiness ought to be read as a signal that one of our background assumptions about determinism is wrong, and interrogated on those terms.
+---

-[1] https://plato.stanford.edu/entries/heraclitus/
-[2] https://martinfowler.com/articles/nonDeterminism.html
+On Flakiness — or, Heraclitus and the Non-Deterministic Test. You have direct pain here from the Appium Mac2 multi-monitor work. A flaky test is the software-testing expression of Heraclitus' river[^1] — you can never step into the same test run twice. But a flaky test isn't nothing; it's a signal that one of your background assumptions about determinism is wrong. The usual move is to quarantine or delete; the philosophical move is to ask what the flakiness is telling you about your model of the system. Martin Fowler's "Eradicating Non-Determinism in Tests"[^2] is the standard reference but deserves to be argued with rather than cited.
+
+[^1]: https://plato.stanford.edu/entries/heraclitus/
+[^2]: https://martinfowler.com/articles/nonDeterminism.html

--- a/articles/drafts/popper-and-the-risky-test.md
+++ b/articles/drafts/popper-and-the-risky-test.md
@ -1,3 +1,12 @@
-Popper and the Risky Test. You've covered modus tollens but stopped short of the inevitable: Popper's demarcation criterion[1]. A good test is a risky prediction — one the product could plausibly fail. Low-risk tests (tests that pass because there's nothing interesting they could catch) are the testing equivalent of unfalsifiable theories: they look like science but aren't. This would give you a rigorous vocabulary for the intuition behind "my regression suite is green but I don't feel confident."
+---
+title: "Popper and the Risky Test"
+date: 2026-04-20
+topics: [philosophy, epistemology]
+related: []
+abstract: >
+  A good test is a risky prediction — one the product could plausibly fail. Low-risk tests that pass because there is nothing interesting they could catch are the testing equivalent of unfalsifiable theories, and Popper's demarcation criterion gives us the vocabulary to say so.
+---

-[1] https://plato.stanford.edu/entries/popper/
+Popper and the Risky Test. You've covered modus tollens but stopped short of the inevitable: Popper's demarcation criterion[^1]. A good test is a risky prediction — one the product could plausibly fail. Low-risk tests (tests that pass because there's nothing interesting they could catch) are the testing equivalent of unfalsifiable theories: they look like science but aren't. This would give you a rigorous vocabulary for the intuition behind "my regression suite is green but I don't feel confident."
+
+[^1]: https://plato.stanford.edu/entries/popper/
--- a/articles/drafts/regression-as-institutional-memory.md
+++ b/articles/drafts/regression-as-institutional-memory.md
@ -1,4 +1,13 @@
-Regression as Institutional Memory. A test suite is the codified set of claims an organisation has committed to believing about itself. Every regression failure is either a discovery (the world changed) or an amnesia event (we forgot why we believed this). The epistemology angle here is rich — Wittgenstein's On Certainty[1] is a good touchstone. This would also give you a stronger argument than you've yet made for why deleting tests is sometimes the right move.
+---
+title: "Regression as Institutional Memory"
+date: 2026-04-20
+topics: [philosophy, epistemology, craft]
+related: []
+abstract: >
+  A test suite is the codified set of claims an organisation has committed to believing about itself. Every regression failure is either a discovery that the world has changed, or an amnesia event in which we have forgotten why we once believed the claim — and distinguishing the two is the real work.
+---

-[1] https://plato.stanford.edu/entries/wittgenstein/#OnCe
+Regression as Institutional Memory. A test suite is the codified set of claims an organisation has committed to believing about itself. Every regression failure is either a discovery (the world changed) or an amnesia event (we forgot why we believed this). The epistemology angle here is rich — Wittgenstein's On Certainty[^1] is a good touchstone. This would also give you a stronger argument than you've yet made for why deleting tests is sometimes the right move.
+
+[^1]: https://plato.stanford.edu/entries/wittgenstein/#OnCe

--- a/articles/drafts/tacit-knowledge-checklist.md
+++ b/articles/drafts/tacit-knowledge-checklist.md
@ -1,4 +1,13 @@
-Tacit Knowledge and the Testing Checklist. Polanyi's The Tacit Dimension[1] — "we know more than we can tell" — is the best available account of why experienced testers catch things that junior ones (and ISTQB certifications) miss. This dovetails neatly with your Competent Tester / Spolsky post and gives you ammunition against the Claude-first mandate you've worried about elsewhere: phronesis cannot be handed to a new hire via checklist, and certainly cannot be handed to an LLM.
+---
+title: "Tacit Knowledge and the Testing Checklist"
+date: 2026-04-20
+topics: [philosophy, craft, epistemology]
+related: []
+abstract: >
+  Polanyi's claim that "we know more than we can tell" is the best available account of why experienced testers catch what juniors and certifications miss. Phronesis cannot be handed to a new hire via checklist, and certainly cannot be handed to a language model.
+---

-[1] https://press.uchicago.edu/ucp/books/book/chicago/T/bo6035368.html
+Tacit Knowledge and the Testing Checklist. Polanyi's The Tacit Dimension[^1] — "we know more than we can tell" — is the best available account of why experienced testers catch things that junior ones (and ISTQB certifications) miss. This dovetails neatly with your Competent Tester / Spolsky post and gives you ammunition against the Claude-first mandate you've worried about elsewhere: phronesis cannot be handed to a new hire via checklist, and certainly cannot be handed to an LLM.
+
+[^1]: https://press.uchicago.edu/ucp/books/book/chicago/T/bo6035368.html

--- a/articles/drafts/test-environments-and-platos-cave.md
+++ b/articles/drafts/test-environments-and-platos-cave.md
@ -1,3 +1,12 @@
-Test Environments as Platonic Shadows. "Works on my machine" isn't a joke, it's an ontological problem. Dev, staging, UAT, prod — each is a cave with its own set of shadows. Plato's cave[1] maps with almost embarrassing precision. This piece could sit beside Perturbation Theory as another "borrowed-framework" essay and give you a natural home for your AWS HealthImaging / cross-manufacturer DICOM work, where the "same" data behaves differently in different environments.
+---
+title: "Test Environments and Plato's Cave"
+date: 2026-04-20
+topics: [philosophy, epistemology]
+related: []
+abstract: >
+  "Works on my machine" is not a joke but an ontological problem. Dev, staging, UAT, and prod are each caves with their own shadows — and the tester's job is to work out which projections can be trusted as evidence about the form beyond.
+---

-[1] https://plato.stanford.edu/entries/plato/
+Test Environments as Platonic Shadows. "Works on my machine" isn't a joke, it's an ontological problem. Dev, staging, UAT, prod — each is a cave with its own set of shadows. Plato's cave[^1] maps with almost embarrassing precision. This piece could sit beside Perturbation Theory as another "borrowed-framework" essay and give you a natural home for your AWS HealthImaging / cross-manufacturer DICOM work, where the "same" data behaves differently in different environments.
+
+[^1]: https://plato.stanford.edu/entries/plato/
--- a/articles/drafts/tester-as-witness.md
+++ b/articles/drafts/tester-as-witness.md
@ -1,5 +1,14 @@
-The Tester as Witness. You've done inspector, scientist, user, explorer, investigative journalist. The legal metaphor is conspicuously absent. A witness does not argue the case, does not render the verdict, and is not the prosecution — but their testimony is what the court's judgment rests on. This cleanly dissolves the "tester as gatekeeper" confusion you raised in Five Essential Lessons. Hume on testimony[1] and C.A.J. Coady's Testimony: A Philosophical Study[2] are obvious references.
+---
+title: "The Tester as Witness"
+date: 2026-04-20
+topics: [philosophy, craft, epistemology]
+related: []
+abstract: >
+  Inspector, scientist, explorer, investigative journalist — the legal metaphor is conspicuously absent from our vocabulary for testers. A witness does not argue the case, render the verdict, or stand for the prosecution, yet their testimony is what the court's judgment rests on. That distinction cleanly dissolves the "tester as gatekeeper" confusion.
+---

-[1] https://plato.stanford.edu/entries/hume/#AnEnHuUn
-[2] https://global.oup.com/academic/product/testimony-9780198235514
+The Tester as Witness. You've done inspector, scientist, user, explorer, investigative journalist. The legal metaphor is conspicuously absent. A witness does not argue the case, does not render the verdict, and is not the prosecution — but their testimony is what the court's judgment rests on. This cleanly dissolves the "tester as gatekeeper" confusion you raised in Five Essential Lessons. Hume on testimony[^1] and C.A.J. Coady's Testimony: A Philosophical Study[^2] are obvious references.
+
+[^1]: https://plato.stanford.edu/entries/hume/#AnEnHuUn
+[^2]: https://global.oup.com/academic/product/testimony-9780198235514

--- a/articles/drafts/testing-probabilistic-systems.md
+++ b/articles/drafts/testing-probabilistic-systems.md
@ -1,4 +1,14 @@
-Testing Probabilistic Systems. LiverMultiScan has ML components; cardiac T1 mapping produces distributions not binaries. The testing pyramid was built for deterministic, functional code — it breaks on probabilistic systems, where "correctness" is a statistical property, not a per-invocation one. This is a natural sequel to Testing Telos: none of your four shapes quite fits ML. Google's "ML Test Score" paper[1] and Christian Kästner's "Machine Learning in Production"[2] are good starting points. This is also where your concern about LLMs and your day job most obviously meet.
+---
+title: "Testing Probabilistic Systems"
+date: 2026-04-20
+topics: [philosophy, craft]
+related:
+  - testing-telos.md
+abstract: >
+  The testing pyramid was built for deterministic, functional code, and it breaks on probabilistic systems where "correctness" is a statistical property rather than a per-invocation one. ML components and signal-producing pipelines demand a different shape of test — and the usual telos-shaped diagrams do not quite accommodate them.
+---

-[1] https://research.google/pubs/the-ml-test-score-a-rubric-for-ml-production-readiness-and-technical-debt-reduction/
-[2] https://ckaestne.github.io/seai/
+Testing Probabilistic Systems. LiverMultiScan has ML components; cardiac T1 mapping produces distributions not binaries. The testing pyramid was built for deterministic, functional code — it breaks on probabilistic systems, where "correctness" is a statistical property, not a per-invocation one. This is a natural sequel to Testing Telos: none of your four shapes quite fits ML. Google's "ML Test Score" paper[^1] and Christian Kästner's "Machine Learning in Production"[^2] are good starting points. This is also where your concern about LLMs and your day job most obviously meet.
+
+[^1]: https://research.google/pubs/the-ml-test-score-a-rubric-for-ml-production-readiness-and-technical-debt-reduction/
+[^2]: https://ckaestne.github.io/seai/
--- a/articles/drafts/the-oracle-problem.md
+++ b/articles/drafts/the-oracle-problem.md
@ -1,5 +1,14 @@
-The Oracle Problem. This is the most glaring missing piece. Your entire framework asks how do we know? — but you haven't yet tackled the uniquely testing-flavoured version: how do we know what "correct" means? An oracle is whatever tells a test whether an output is right. In your world, oracles are sometimes requirements, sometimes expectations, sometimes customer satisfaction, sometimes regulator sign-off — and they conflict. Elaine Weyuker's original 1982 paper[1] on the oracle assumption and Doug Hoffman's "Heuristic Test Oracles"[2] are the obvious anchors. This also unifies your Categories-of-Testing triad: each of the three fact-kinds has its own oracle species.
+---
+title: "The Oracle Problem"
+date: 2026-04-20
+topics: [philosophy, epistemology, craft]
+related: []
+abstract: >
+  The uniquely testing-flavoured version of "how do we know?" is: how do we know what "correct" means? An oracle is whatever tells a test whether an output is right, and in practice oracles are requirements, expectations, customer satisfaction, and regulator sign-off — all of which can conflict.
+---
+
+The Oracle Problem. This is the most glaring missing piece. Your entire framework asks how do we know? — but you haven't yet tackled the uniquely testing-flavoured version: how do we know what "correct" means? An oracle is whatever tells a test whether an output is right. In your world, oracles are sometimes requirements, sometimes expectations, sometimes customer satisfaction, sometimes regulator sign-off — and they conflict. Elaine Weyuker's original 1982 paper[^1] on the oracle assumption and Doug Hoffman's "Heuristic Test Oracles"[^2] are the obvious anchors. This also unifies your Categories-of-Testing triad: each of the three fact-kinds has its own oracle species.


-1. https://dl.acm.org/doi/10.1093/comjnl/25.4.465
-2. https://www.stickyminds.com/article/heuristic-test-oracles
+[^1]: https://dl.acm.org/doi/10.1093/comjnl/25.4.465
+[^2]: https://www.stickyminds.com/article/heuristic-test-oracles
--- a/articles/drafts/when-quality-becomes-quantity.md
+++ b/articles/drafts/when-quality-becomes-quantity.md
@ -1,4 +1,13 @@
-When Quality Becomes Quantity — Goodhart's Law and the Metrics Trap. "When a measure becomes a target, it ceases to be a good measure."[1] This is the missing chapter of your Cucumber polemic. Coverage percentages, pass rates, defect counts — all of them degrade the moment they become OKRs. You already hint at this in Five Essential Lessons when you say "automation is a tool, not a goal"; Goodhart lets you prove it.
+---
+title: "When Quality Becomes Quantity"
+date: 2026-04-20
+topics: [philosophy, craft]
+related: []
+abstract: >
+  Coverage percentages, pass rates, defect counts — all of them degrade the moment they become OKRs. Goodhart's Law is the missing chapter of any honest argument about metrics in testing, and it turns the slogan "automation is a tool, not a goal" into a demonstrable claim.
+---

-[1] https://en.wikipedia.org/wiki/Goodhart%27s_law
+When Quality Becomes Quantity — Goodhart's Law and the Metrics Trap. "When a measure becomes a target, it ceases to be a good measure."[^1] This is the missing chapter of your Cucumber polemic. Coverage percentages, pass rates, defect counts — all of them degrade the moment they become OKRs. You already hint at this in Five Essential Lessons when you say "automation is a tool, not a goal"; Goodhart lets you prove it.
+
+[^1]: https://en.wikipedia.org/wiki/Goodhart%27s_law