grokkit/docs/developer-guide/TEST-COVERAGE.md
Greg Gauthier c49f6d84ef docs: add WOMM certification badge and developer guides
- Add WOMM bronze certification SVG and text certificate
- Add CLAUDE.md for AI coding guidance
- Update README.md with certification badge
- Add TEST-COVERAGE.md in docs/developer-guide
2026-04-02 17:25:45 +01:00

3.5 KiB
Raw Blame History

Test Coverage Assessment

Date: 2026-04-02 Scope: All packages in cmd/, internal/, and config/ Test files reviewed: 32 across 12 packages


Overall: ~60% unit / ~20% integration

The pattern is consistent: pure helper functions are well-tested; the actual command execution paths are mostly not.


Per-Package Breakdown

Package Est. Coverage Notes
internal/logger/ ~85% Comprehensive public API coverage
internal/prompts/ ~85% All load paths + fallback covered
internal/errors/ ~80% Error types + unwrapping covered
internal/todo/ ~80% Bootstrap idempotency solid
internal/linter/ ~75% Language detection strong; output parsing weak
config/ ~75% Config getters good; file-load errors not tested
internal/workon/ ~70% State transitions good; Run() itself untested
internal/git/ ~70% Real git integration, good core coverage
internal/grok/ ~60% Streaming + SSE parsing tested; error paths absent
internal/version/ ~50% Just variable presence check
internal/recipe/ ~50% Loading tested; execution engine (Run(), refactorFiles(), handleApplyStep()) entirely untested
cmd/ ~2535% Message builders tested; nearly all run*() functions untested

What's Systematically Untested

Command execution — Most run*() functions in cmd/ have zero direct tests:

  • runAgent(), runChat(), runQuery(), runRecipe(), runTestgen(), runWorkon(), runChangelogCommand(), runDocs()

API error scenarios — The grok client has no tests for: HTTP errors, rate limits, auth failures, malformed SSE chunks, timeouts, or stream cancellation.

Recipe executioninternal/recipe/ tests only YAML loading. The entire execution side (Run, refactorFiles, handleApplyStep, executeReadOnlyShell) is untouched.

User interaction — All interactive confirmation/prompt flows are untested. No mock stdin anywhere.

File system errors — Read-only paths, permission failures, disk full — none tested.


Highest-Impact Gaps

  1. internal/recipe/ — execution engine is production code with zero test coverage
  2. internal/grok/ — error path coverage missing entirely for a network client
  3. cmd/agent.go, cmd/chat.go, cmd/query.go — core UX features without tests
  4. config/ — file parsing errors and env var overrides not tested

Strengths

  • Good use of t.TempDir() for isolation
  • Real git integration in internal/git/ tests (not mocked)
  • Mock injection via function variables (git runner, API client) makes testing feasible
  • t.Parallel() used consistently where appropriate
  • Error type chain verification (errors.Is) is present

Weaknesses

  • The "Live" test pattern (TestScaffoldCmd_Live) uses testing.Short() logic inconsistently — it's gated but the semantics are inverted from convention
  • No benchmark tests anywhere
  • No CI separation between unit and integration tests — "live" tests quietly depend on environment
  • Mocking strategy is inconsistent across packages (some use testify mocks, some manual function variables)

Priority Recommendations

High: Add tests for internal/recipe/ execution, grok client error scenarios, and runAgent()/runChat()/runQuery().

Medium: Test config/ file-load failure paths, git error scenarios (uninitialized repo), and file permission errors.

Low: Standardize the Live test pattern, add benchmarks for SSE streaming, add concurrent operation tests.