# Test Coverage Assessment **Date:** 2026-04-02 **Scope:** All packages in `cmd/`, `internal/`, and `config/` **Test files reviewed:** 32 across 12 packages --- ## Overall: ~60% unit / ~20% integration The pattern is consistent: **pure helper functions are well-tested; the actual command execution paths are mostly not**. --- ## Per-Package Breakdown | Package | Est. Coverage | Notes | |---|---|---| | `internal/logger/` | ~85% | Comprehensive public API coverage | | `internal/prompts/` | ~85% | All load paths + fallback covered | | `internal/errors/` | ~80% | Error types + unwrapping covered | | `internal/todo/` | ~80% | Bootstrap idempotency solid | | `internal/linter/` | ~75% | Language detection strong; output parsing weak | | `config/` | ~75% | Config getters good; file-load errors not tested | | `internal/workon/` | ~70% | State transitions good; `Run()` itself untested | | `internal/git/` | ~70% | Real git integration, good core coverage | | `internal/grok/` | ~60% | Streaming + SSE parsing tested; error paths absent | | `internal/version/` | ~50% | Just variable presence check | | `internal/recipe/` | ~50% | Loading tested; execution engine (`Run()`, `refactorFiles()`, `handleApplyStep()`) entirely untested | | `cmd/` | ~25–35% | Message builders tested; nearly all `run*()` functions untested | --- ## What's Systematically Untested **Command execution** — Most `run*()` functions in `cmd/` have zero direct tests: - `runAgent()`, `runChat()`, `runQuery()`, `runRecipe()`, `runTestgen()`, `runWorkon()`, `runChangelogCommand()`, `runDocs()` **API error scenarios** — The grok client has no tests for: HTTP errors, rate limits, auth failures, malformed SSE chunks, timeouts, or stream cancellation. **Recipe execution** — `internal/recipe/` tests only YAML loading. The entire execution side (`Run`, `refactorFiles`, `handleApplyStep`, `executeReadOnlyShell`) is untouched. **User interaction** — All interactive confirmation/prompt flows are untested. No mock stdin anywhere. **File system errors** — Read-only paths, permission failures, disk full — none tested. --- ## Highest-Impact Gaps 1. `internal/recipe/` — execution engine is production code with zero test coverage 2. `internal/grok/` — error path coverage missing entirely for a network client 3. `cmd/agent.go`, `cmd/chat.go`, `cmd/query.go` — core UX features without tests 4. `config/` — file parsing errors and env var overrides not tested --- ## Strengths - Good use of `t.TempDir()` for isolation - Real git integration in `internal/git/` tests (not mocked) - Mock injection via function variables (git runner, API client) makes testing feasible - `t.Parallel()` used consistently where appropriate - Error type chain verification (`errors.Is`) is present ## Weaknesses - The "Live" test pattern (`TestScaffoldCmd_Live`) uses `testing.Short()` logic inconsistently — it's gated but the semantics are inverted from convention - No benchmark tests anywhere - No CI separation between unit and integration tests — "live" tests quietly depend on environment - Mocking strategy is inconsistent across packages (some use testify mocks, some manual function variables) --- ## Priority Recommendations **High:** Add tests for `internal/recipe/` execution, grok client error scenarios, and `runAgent()`/`runChat()`/`runQuery()`. **Medium:** Test `config/` file-load failure paths, git error scenarios (uninitialized repo), and file permission errors. **Low:** Standardize the Live test pattern, add benchmarks for SSE streaming, add concurrent operation tests.