work-blog/articles/drafts/testing-probabilistic-systems.md

---
title: "Testing Probabilistic Systems"
date: 2026-04-20
topics: [philosophy, craft]
related:
  - testing-telos.md
abstract: >
  The testing pyramid was built for deterministic, functional code, and it breaks on probabilistic systems where "correctness" is a statistical property rather than a per-invocation one. ML components and signal-producing pipelines demand a different shape of test — and the usual telos-shaped diagrams do not quite accommodate them.
---

Testing Probabilistic Systems. LiverMultiScan has ML components; cardiac T1 mapping produces distributions not binaries. The testing pyramid was built for deterministic, functional code — it breaks on probabilistic systems, where "correctness" is a statistical property, not a per-invocation one. This is a natural sequel to Testing Telos: none of your four shapes quite fits ML. Google's "ML Test Score" paper[^1] and Christian Kästner's "Machine Learning in Production"[^2] are good starting points. This is also where your concern about LLMs and your day job most obviously meet.

[^1]: https://research.google/pubs/the-ml-test-score-a-rubric-for-ml-production-readiness-and-technical-debt-reduction/
[^2]: https://ckaestne.github.io/seai/
docs(articles): add frontmatter to drafts and update README Standardize draft articles with YAML frontmatter including title, date, topics, related, and abstract. Expand README drafts section into a table listing all drafts with topics. Add "Testing Telos" to published articles. 2026-04-20 09:58:49 +00:00			`---`
			`title: "Testing Probabilistic Systems"`
			`date: 2026-04-20`
			`topics: [philosophy, craft]`
			`related:`
			`- testing-telos.md`
			`abstract: >`
			`The testing pyramid was built for deterministic, functional code, and it breaks on probabilistic systems where "correctness" is a statistical property rather than a per-invocation one. ML components and signal-producing pipelines demand a different shape of test — and the usual telos-shaped diagrams do not quite accommodate them.`
			`---`
feat(drafts): add initial drafts for philosophy-inspired testing articles Introduces nine new draft articles exploring intersections of software testing with philosophy, epistemology, and related concepts: - On Flakiness (Heraclitus and non-deterministic tests) - Popper and the Risky Test (demarcation criterion) - Regression as Institutional Memory (Wittgenstein's On Certainty) - Tacit Knowledge and the Testing Checklist (Polanyi's tacit dimension) - Test Environments as Platonic Shadows (Plato's cave allegory) - The Tester as Witness (legal metaphor and testimony) - Testing Probabilistic Systems (ML and statistical testing) - The Oracle Problem (oracles in testing frameworks) - When Quality Becomes Quantity (Goodhart's Law and metrics) 2026-04-20 08:28:28 +00:00
docs(articles): add frontmatter to drafts and update README Standardize draft articles with YAML frontmatter including title, date, topics, related, and abstract. Expand README drafts section into a table listing all drafts with topics. Add "Testing Telos" to published articles. 2026-04-20 09:58:49 +00:00			Testing Probabilistic Systems. LiverMultiScan has ML components; cardiac T1 mapping produces distributions not binaries. The testing pyramid was built for deterministic, functional code — it breaks on probabilistic systems, where "correctness" is a statistical property, not a per-invocation one. This is a natural sequel to Testing Telos: none of your four shapes quite fits ML. Google's "ML Test Score" paper[^1] and Christian Kästner's "Machine Learning in Production"[^2] are good starting points. This is also where your concern about LLMs and your day job most obviously meet.

			`[^1]: https://research.google/pubs/the-ml-test-score-a-rubric-for-ml-production-readiness-and-technical-debt-reduction/`
			`[^2]: https://ckaestne.github.io/seai/`