work-blog/articles/published/the-perturbation-theory-of-exploratory-testing.md
Gregory Gauthier da44ea30f1 refactor(structure): reorganize articles and assets directories
Move drafts to articles/drafts, articles to articles/published, and assets to general and memes subdirs.
2026-04-07 15:18:51 +01:00

22 KiB
Raw Blame History

Perturb- What?

Perturbation Theory is a powerful mathematical framework in physics used to find approximate solutions to complex problems that can't be solved exactly. You tweak a well-understood system slightly, in a controlled way, and see how those small changes ripple through. This technique is ubiquitous in quantum mechanics, but it also pops up in classical mechanics, electromagnetism, and even cosmology. The core idea: Start with a "simple" problem you can solve exactly, then treat the complicating factors as tiny "perturbations" and calculate their effects step by step.

Consider the problem of Jupiter in calculating planetary orbits. It turns out that approximating the trajectory of a planet orbiting the Sun (the exact, solvable part) while accounting for the slight gravitational tug from Jupiter (the perturbation), requires ridiculously messy equations without the aid of something like Perturbation Theory.

Many much more sophisticate real-world physical systems are too intricate for closed-form solutions. For example:

  • In quantum mechanics, the Schrödinger equation for multi-electron atoms or molecules is nightmarishly complicated.

  • In particle physics, interactions between quarks or fields involve infinities that exact math can't tame.

Perturbation theory shines when the "extra stuff" (like interactions or external fields) is weak compared to the main system.

What Does That Have To Do With Software Testing?

A “Perturbation”

As a software tester in a large, complex environment, I wonder if the technique of "tweaking a well-understood system slightly and seeing how those small changes ripple through" could be applied to a practice like exploratory testing. Very often, I deal in "complex problems" and "approximate solutions" for them. I imagine a principle like the one behind the "perturbation theory" might be extremely useful in testing.

In fact, perturbation theory's elegance in handling complexity through controlled approximations translates beautifully to software testing, especially in exploratory contexts where you're navigating black-box behaviours without exhaustive scripting or extremely thorough documentation. In fact, it's not just a loose metaphor; software engineering has formalised similar ideas under terms like perturbation testing and chaos engineering, which treat the application as that "well-understood system" and introduce deliberate tweaks to reveal hidden ripples (bugs, vulnerabilities, or resilience gaps):

  1. Perturbation Testing (for Domain Errors and Vulnerabilities):

    • This directly echoes the physics concept: You perturb arithmetic expressions or environmental variables in code to detect faults that slip past traditional unit tests. For instance, in a financial app, tweak a calculation like total = price * quantity by swapping operators (+ instead of *) or injecting off-by-one errors, then run the perturbed code and compare outputs. Ripples? Silent overflows leading to bad trades.

    • In exploratory terms: During sessions, use this to probe edge cases. Tool tip: Browser dev tools or Postman can inject perturbations like malformed JSON payloads. It's great for "approximate solutions" in large systems, as you can automate low-order checks and explore higher-order manually.

  2. Chaos Engineering (for System-Wide Ripples):

    • Here, perturbations are intentional failures (e.g., killing pods in Kubernetes or spiking latency) injected into production-like environments to observe holistic responses. Netflix pioneered this to harden streaming services—think: Tweak network conditions slightly and watch if recommendations degrade gracefully or cascade into outages.

    • Exploratory angle: Frame your charters around "steady-state hypotheses" (e.g., "The app handles 10% traffic spikes without dropping sessions"). Start small (λ ≈ 0.1), measure, then scale. Tools like Gremlin or Litmus make it low-friction for testers. In complex setups, this uncovers non-obvious interactions, like a microservice tweak rippling to auth timeouts.

How Would That Even Work?

Figuring It Out

In perturbation theory, you baseline a solvable model (H₀) and layer in small disturbances (λV) to approximate outcomes iteratively. In testing:

  • Unperturbed baseline: The "happy path" or nominal workflow—e.g., a standard user login with valid creds in your app. This is your exactly solvable starting point, mapped out via quick smoke tests or session charters.

  • Perturbations: Introduce minor, controlled variations—invalid inputs, timing delays, resource constraints, or config tweaks—and observe the "order-by-order" effects. First-order: Does it just shift the output slightly? Second-order: Does it cascade into failures elsewhere (e.g., a UI glitch triggering a backend deadlock)?

  • Approximate solutions: You don't need to test every combo (impossible in complex systems); instead, prioritise perturbations based on risk (e.g., high-impact areas like auth or payments) and iterate until the ripples stabilise or reveal breakpoints.

This mindset shifts exploratory testing from pure ad-hoc wandering to a more structured "scientific experiment," where each tweak builds confidence in (or debunks) the system's robustness. It's approximate because real environments are chaotic, but that's the point: You're modelling resilience under uncertainty, much like physicists approximate quantum interactions.

Applying It to Our Workflows

In a complex environment such as ours, starting simple avoids overload. Here are some examples of how we could apply a “perturbation theory” approach to testing at Perspectum:

  • Charter Design: "Perturb user inputs in the data-selection flow and map first-order UI ripples." Log variations systematically (e.g., via session notes in Rapid Reporter).

  • Risk-Guided Scaling: Use historical bug data to pick perturbations—focus on "weak couplings" like API integrations.

  • Hybrid with Automation: Pair exploratory perturbations with scripted “fuzzing” frameworks (for example, AFL++, which is useful with C++ apps), then explore the “ripples” with follow-up charters.

  • Metrics for "Good Enough": Track coverage not by lines-of-code, but by "perturbation tolerance" (e.g., % of tweaks that stay within SLAs). This is just one of dozens of ways to make “coverage” a high resolution signal for quality.

This approach could make our testing more predictive and less reactive, turning approximations into actionable insights.

How It Might Look In Practice

Here is an example using Elizabeth Hendricksons “Explore It!” approach to exploratory testing. The framework employs a concise charter statement, focused data collection, and time-bound sessions. Here is how we might synthesise a “Perturbation” version of such a framework:

The charter treats the whole service working “correctly” as our "unperturbed baseline" (the nominal end-to-end workflow) and introduces controlled "perturbations" (small tweaks to inputs, timings, or configs) to reveal ripples (e.g., data integrity issues, SaMD analysis issues, or report inaccuracies). This keeps the session structured yet flexible, ideal for a complex, regulated domain like medical data handling.

In practice, you'd document this in a Jira ticket or a Confluence page. Ideally, it would be run during a sprint in a pair. Both a test engineer and a dev for pair-testing. Aim for 60120 minutes to avoid fatigue, and debrief afterward to log insights as stories or defects.

Example Charter Details

Section Content
Charter Statement Explore the resilience of the end-to-end study data pipeline by perturbing key workflows around data uploads, associations, downloads, SaMD analyses, and report distribution. Focus on first- and second-order ripples in data integrity, usability, and compliance. Hypothesis: Small input or timing perturbations will expose integration gaps without cascading into full failures.
Scope & Focus Areas Baseline Workflow: Simulate a "happy path" cycle: Customer uploads valid DICOM files (e.g., 100MB anonymized scan); associate with a trial project/clinical diagnosis; download to analyst workstation; run SaMD analysis (e.g., liver segmentation); generate PDF report; distribute report via appropriate channel.
- Perturbations (Prioritize 35 based on risk; start small, λ ≈ 0.10.5): 1. Input Tweaks: Malformed uploads (e.g., corrupted headers, oversized files >500MB, or non-DICOM formats like JPEG). 2. Association Delays: Slow portal uploads, or intermittent connection problems (simulate with browser throttling or proxy tools like Charles). 3. Download/Access Perturbs: Partial downloads, concurrent analyst sessions, or role-based access tweaks (e.g., revoke mid-download). 4. ML/Visualization Ripples: Inject noisy data into analyses (e.g., add synthetic artifacts); perturb viz params (e.g., wrong slice orientations). 5. Report Generation: Tweak output formats (e.g., invalid email domains) or trigger bulk distributions. - Out of Scope: Full performance/load testing or hardware-level scanner sims—focus on app-layer behaviours.
Resources & Setup Environment: QASDEV or UAT Tools: Browser (Chrome DevTools for perturbations), Postman for API mocks, sample DICOM files (from public datasets like TCIA), screen recorder (e.g., Loom). Data: From the S3 test data bucket.
Time Box 90 minutes total: - 10 min: Setup & baseline run. - 60 min: Perturb & observe (cycle through 23 areas). - 20 min: Debrief & note promising leads.
Stop Conditions Time's up. - 3+ high-severity issues found (e.g., data leakage). - Hypothesis validated/invalidated (e.g., no ripples beyond first-order errors). - Boredom or blocker (e.g., auth outage—escalate).
Heuristics & Questions (Perturbation-Guided Probes) What happens when a perturbation hits a "weak coupling" (e.g., ML model on bad data)? - First-order: Immediate error? Second-order: Corrupted downstream reports? - Compliance lens: Does any ripple expose PHI (protected health info)? - Use SFDPOT (Structure, Functions, Data, Platforms, Operations, Time) to brainstorm tweaks on the fly.

Sample Session Notes (What It Might Look Like Post-Run)

During execution, jot real-time observations in a table or bullets—treating this as your "order-by-order expansion." Here's a fictional but realistic excerpt from a 90-min run:

Time Perturbation Observation (Ripples) Severity Notes/Actions
Time Perturbation Observation (Ripples) Severity Notes/Actions
10:15 Baseline: Upload valid 50MB DICOM to Trial #123. Full cycle completes: Associated in 2s, downloaded clean, LMS accurate (complete segmentation), report delivered in 30s. No issues. Info Screenshot workflow; baseline stable.
25:30 Input tweak: Corrupt header in zip upload (edit via hex editor). Upload scan success, but zip extraction and dicom parsing fails. No data leak. Second-order: Analyst sees no worklist task. Medium Log as perf story; test retry limits.
45:00 Download perturb: Concurrent access by two analysts (same file). Race condition! Both analysts take the task to be executed. Only one analyst sees the task on his worklist afterward. High Repro steps: [Bulleted login sequence]. Escalate to backend team—potential sync bug.
65:20 ML perturb: Add noise to slices in Horos. LMS fails to segment. Results include un-flagged low-confidence metrics—risky for clinical use. Critical Hypothesis busted: Ripples cascade to compliance violation. Propose input validation guardrails.
80:00 Report tweak: Invalid email (e.g., "test@nonexistent.com"). Distribution queues but bounces silently; customer dashboard shows "sent" status falsely. No notification. Medium UI/UX issue; add bounce webhook?

Debrief Insights

  • Wins: Uncovered 2 high-impact bugs in 90 min—faster than scripted tests for this complexity.

  • Learnings: Perturbations amplified integration risks (e.g., ML sensitivity); next charter: Dive deeper into LMS perturbations.

  • Coverage Gaps: Only did LMS. No BodyComp.

  • Perturbation Rating: Model worked well; higher-order ripples (e.g., from noise) were most valuable.

This charter keeps things approximate and iterative, much like perturbation theory—start with the solvable baseline, layer in tweaks, and approximate the system's "energy landscape" (stability).