feat(drafts): add initial drafts for philosophy-inspired testing articles

Introduces nine new draft articles exploring intersections of software testing with philosophy, epistemology, and related concepts: - On Flakiness (Heraclitus and non-deterministic tests) - Popper and the Risky Test (demarcation criterion) - Regression as Institutional Memory (Wittgenstein's On Certainty) - Tacit Knowledge and the Testing Checklist (Polanyi's tacit dimension) - Test Environments as Platonic Shadows (Plato's cave allegory) - The Tester as Witness (legal metaphor and testimony) - Testing Probabilistic Systems (ML and statistical testing) - The Oracle Problem (oracles in testing frameworks) - When Quality Becomes Quantity (Goodhart's Law and metrics)
2026-04-20 09:28:28 +01:00 · 2026-04-20 09:28:28 +01:00 · 544b773e8f
commit 544b773e8f
parent 3cb83e0276
9 changed files with 37 additions and 0 deletions
--- a/articles/drafts/on-flakiness.md
+++ b/articles/drafts/on-flakiness.md
@ -0,0 +1,5 @@
 On Flakiness — or, Heraclitus and the Non-Deterministic Test. You have direct pain here from the Appium Mac2 multi-monitor work. A flaky test is the software-testing expression of Heraclitus' river[1] — you can never step into the same test run twice. But a flaky test isn't nothing; it's a signal that one of your background assumptions about determinism is wrong. The usual move is to quarantine or delete; the philosophical move is to ask what the flakiness is telling you about your model of the system. Martin Fowler's "Eradicating Non-Determinism in Tests"[2] is the standard reference but deserves to be argued with rather than cited.
 [1] https://plato.stanford.edu/entries/heraclitus/
 [2] https://martinfowler.com/articles/nonDeterminism.html
--- a/articles/drafts/popper-and-the-risky-test.md
+++ b/articles/drafts/popper-and-the-risky-test.md
@ -0,0 +1,3 @@
 Popper and the Risky Test. You've covered modus tollens but stopped short of the inevitable: Popper's demarcation criterion[1]. A good test is a risky prediction — one the product could plausibly fail. Low-risk tests (tests that pass because there's nothing interesting they could catch) are the testing equivalent of unfalsifiable theories: they look like science but aren't. This would give you a rigorous vocabulary for the intuition behind "my regression suite is green but I don't feel confident."
 [1] https://plato.stanford.edu/entries/popper/
--- a/articles/drafts/regression-as-institutional-memory.md
+++ b/articles/drafts/regression-as-institutional-memory.md
@ -0,0 +1,4 @@
 Regression as Institutional Memory. A test suite is the codified set of claims an organisation has committed to believing about itself. Every regression failure is either a discovery (the world changed) or an amnesia event (we forgot why we believed this). The epistemology angle here is rich — Wittgenstein's On Certainty[1] is a good touchstone. This would also give you a stronger argument than you've yet made for why deleting tests is sometimes the right move.
 [1] https://plato.stanford.edu/entries/wittgenstein/#OnCe
--- a/articles/drafts/tacit-knowledge-checklist.md
+++ b/articles/drafts/tacit-knowledge-checklist.md
@ -0,0 +1,4 @@
 Tacit Knowledge and the Testing Checklist. Polanyi's The Tacit Dimension[1] — "we know more than we can tell" — is the best available account of why experienced testers catch things that junior ones (and ISTQB certifications) miss. This dovetails neatly with your Competent Tester / Spolsky post and gives you ammunition against the Claude-first mandate you've worried about elsewhere: phronesis cannot be handed to a new hire via checklist, and certainly cannot be handed to an LLM.
 [1] https://press.uchicago.edu/ucp/books/book/chicago/T/bo6035368.html
--- a/articles/drafts/test-environments-and-platos-cave.md
+++ b/articles/drafts/test-environments-and-platos-cave.md
@ -0,0 +1,3 @@
 Test Environments as Platonic Shadows. "Works on my machine" isn't a joke, it's an ontological problem. Dev, staging, UAT, prod — each is a cave with its own set of shadows. Plato's cave[1] maps with almost embarrassing precision. This piece could sit beside Perturbation Theory as another "borrowed-framework" essay and give you a natural home for your AWS HealthImaging / cross-manufacturer DICOM work, where the "same" data behaves differently in different environments.
 [1] https://plato.stanford.edu/entries/plato/
--- a/articles/drafts/tester-as-witness.md
+++ b/articles/drafts/tester-as-witness.md
@ -0,0 +1,5 @@
 The Tester as Witness. You've done inspector, scientist, user, explorer, investigative journalist. The legal metaphor is conspicuously absent. A witness does not argue the case, does not render the verdict, and is not the prosecution — but their testimony is what the court's judgment rests on. This cleanly dissolves the "tester as gatekeeper" confusion you raised in Five Essential Lessons. Hume on testimony[1] and C.A.J. Coady's Testimony: A Philosophical Study[2] are obvious references.
 [1] https://plato.stanford.edu/entries/hume/#AnEnHuUn
 [2] https://global.oup.com/academic/product/testimony-9780198235514
--- a/articles/drafts/testing-probabilistic-systems.md
+++ b/articles/drafts/testing-probabilistic-systems.md
@ -0,0 +1,4 @@
 Testing Probabilistic Systems. LiverMultiScan has ML components; cardiac T1 mapping produces distributions not binaries. The testing pyramid was built for deterministic, functional code — it breaks on probabilistic systems, where "correctness" is a statistical property, not a per-invocation one. This is a natural sequel to Testing Telos: none of your four shapes quite fits ML. Google's "ML Test Score" paper[1] and Christian Kästner's "Machine Learning in Production"[2] are good starting points. This is also where your concern about LLMs and your day job most obviously meet.
 [1] https://research.google/pubs/the-ml-test-score-a-rubric-for-ml-production-readiness-and-technical-debt-reduction/
 [2] https://ckaestne.github.io/seai/
--- a/articles/drafts/the-oracle-problem.md
+++ b/articles/drafts/the-oracle-problem.md
@ -0,0 +1,5 @@
 The Oracle Problem. This is the most glaring missing piece. Your entire framework asks how do we know? — but you haven't yet tackled the uniquely testing-flavoured version: how do we know what "correct" means? An oracle is whatever tells a test whether an output is right. In your world, oracles are sometimes requirements, sometimes expectations, sometimes customer satisfaction, sometimes regulator sign-off — and they conflict. Elaine Weyuker's original 1982 paper[1] on the oracle assumption and Doug Hoffman's "Heuristic Test Oracles"[2] are the obvious anchors. This also unifies your Categories-of-Testing triad: each of the three fact-kinds has its own oracle species.
 1. https://dl.acm.org/doi/10.1093/comjnl/25.4.465
 2. https://www.stickyminds.com/article/heuristic-test-oracles
--- a/articles/drafts/when-quality-becomes-quantity.md
+++ b/articles/drafts/when-quality-becomes-quantity.md
@ -0,0 +1,4 @@
 When Quality Becomes Quantity — Goodhart's Law and the Metrics Trap. "When a measure becomes a target, it ceases to be a good measure."[1] This is the missing chapter of your Cucumber polemic. Coverage percentages, pass rates, defect counts — all of them degrade the moment they become OKRs. You already hint at this in Five Essential Lessons when you say "automation is a tool, not a goal"; Goodhart lets you prove it.
 [1] https://en.wikipedia.org/wiki/Goodhart%27s_law
		`@ -0,0 +1,3 @@`
							Popper and the Risky Test. You've covered modus tollens but stopped short of the inevitable: Popper's demarcation criterion[1]. A good test is a risky prediction — one the product could plausibly fail. Low-risk tests (tests that pass because there's nothing interesting they could catch) are the testing equivalent of unfalsifiable theories: they look like science but aren't. This would give you a rigorous vocabulary for the intuition behind "my regression suite is green but I don't feel confident."

							`[1] https://plato.stanford.edu/entries/popper/`
		`@ -0,0 +1,4 @@`
							`Regression as Institutional Memory. A test suite is the codified set of claims an organisation has committed to believing about itself. Every regression failure is either a discovery (the world changed) or an amnesia event (we forgot why we believed this). The epistemology angle here is rich — Wittgenstein's On Certainty[1] is a good touchstone. This would also give you a stronger argument than you've yet made for why deleting tests is sometimes the right move.`

							`[1] https://plato.stanford.edu/entries/wittgenstein/#OnCe`
		`@ -0,0 +1,4 @@`
							`Tacit Knowledge and the Testing Checklist. Polanyi's The Tacit Dimension[1] — "we know more than we can tell" — is the best available account of why experienced testers catch things that junior ones (and ISTQB certifications) miss. This dovetails neatly with your Competent Tester / Spolsky post and gives you ammunition against the Claude-first mandate you've worried about elsewhere: phronesis cannot be handed to a new hire via checklist, and certainly cannot be handed to an LLM.`

							`[1] https://press.uchicago.edu/ucp/books/book/chicago/T/bo6035368.html`
		`@ -0,0 +1,3 @@`
							`Test Environments as Platonic Shadows. "Works on my machine" isn't a joke, it's an ontological problem. Dev, staging, UAT, prod — each is a cave with its own set of shadows. Plato's cave[1] maps with almost embarrassing precision. This piece could sit beside Perturbation Theory as another "borrowed-framework" essay and give you a natural home for your AWS HealthImaging / cross-manufacturer DICOM work, where the "same" data behaves differently in different environments.`

							`[1] https://plato.stanford.edu/entries/plato/`
		`@ -0,0 +1,4 @@`
							`When Quality Becomes Quantity — Goodhart's Law and the Metrics Trap. "When a measure becomes a target, it ceases to be a good measure."[1] This is the missing chapter of your Cucumber polemic. Coverage percentages, pass rates, defect counts — all of them degrade the moment they become OKRs. You already hint at this in Five Essential Lessons when you say "automation is a tool, not a goal"; Goodhart lets you prove it.`

							`[1] https://en.wikipedia.org/wiki/Goodhart%27s_law`