work-blog/articles/drafts/testing-probabilistic-systems.md

4 lines
739 B
Markdown
Raw Normal View History

Testing Probabilistic Systems. LiverMultiScan has ML components; cardiac T1 mapping produces distributions not binaries. The testing pyramid was built for deterministic, functional code — it breaks on probabilistic systems, where "correctness" is a statistical property, not a per-invocation one. This is a natural sequel to Testing Telos: none of your four shapes quite fits ML. Google's "ML Test Score" paper[1] and Christian Kästner's "Machine Learning in Production"[2] are good starting points. This is also where your concern about LLMs and your day job most obviously meet.
[1] https://research.google/pubs/the-ml-test-score-a-rubric-for-ml-production-readiness-and-technical-debt-reduction/
[2] https://ckaestne.github.io/seai/