grokkit/docs/developer-guide/TEST-COVERAGE.md

# Test Coverage Assessment

**Date:** 2026-04-02
**Scope:** All packages in `cmd/`, `internal/`, and `config/`
**Test files reviewed:** 32 across 12 packages

---

## Overall: ~60% unit / ~20% integration

The pattern is consistent: **pure helper functions are well-tested; the actual command execution paths are mostly not**.

---

## Per-Package Breakdown

| Package | Est. Coverage | Notes |
|---|---|---|
| `internal/logger/` | ~85% | Comprehensive public API coverage |
| `internal/prompts/` | ~85% | All load paths + fallback covered |
| `internal/errors/` | ~80% | Error types + unwrapping covered |
| `internal/todo/` | ~80% | Bootstrap idempotency solid |
| `internal/linter/` | ~75% | Language detection strong; output parsing weak |
| `config/` | ~75% | Config getters good; file-load errors not tested |
| `internal/workon/` | ~70% | State transitions good; `Run()` itself untested |
| `internal/git/` | ~70% | Real git integration, good core coverage |
| `internal/grok/` | ~60% | Streaming + SSE parsing tested; error paths absent |
| `internal/version/` | ~50% | Just variable presence check |
| `internal/recipe/` | ~50% | Loading tested; execution engine (`Run()`, `refactorFiles()`, `handleApplyStep()`) entirely untested |
| `cmd/` | ~25–35% | Message builders tested; nearly all `run*()` functions untested |

---

## What's Systematically Untested

**Command execution** — Most `run*()` functions in `cmd/` have zero direct tests:
- `runAgent()`, `runChat()`, `runQuery()`, `runRecipe()`, `runTestgen()`, `runWorkon()`, `runChangelogCommand()`, `runDocs()`

**API error scenarios** — The grok client has no tests for: HTTP errors, rate limits, auth failures, malformed SSE chunks, timeouts, or stream cancellation.

**Recipe execution** — `internal/recipe/` tests only YAML loading. The entire execution side (`Run`, `refactorFiles`, `handleApplyStep`, `executeReadOnlyShell`) is untouched.

**User interaction** — All interactive confirmation/prompt flows are untested. No mock stdin anywhere.

**File system errors** — Read-only paths, permission failures, disk full — none tested.

---

## Highest-Impact Gaps

1. `internal/recipe/` — execution engine is production code with zero test coverage
2. `internal/grok/` — error path coverage missing entirely for a network client
3. `cmd/agent.go`, `cmd/chat.go`, `cmd/query.go` — core UX features without tests
4. `config/` — file parsing errors and env var overrides not tested

---

## Strengths

- Good use of `t.TempDir()` for isolation
- Real git integration in `internal/git/` tests (not mocked)
- Mock injection via function variables (git runner, API client) makes testing feasible
- `t.Parallel()` used consistently where appropriate
- Error type chain verification (`errors.Is`) is present

## Weaknesses

- The "Live" test pattern (`TestScaffoldCmd_Live`) uses `testing.Short()` logic inconsistently — it's gated but the semantics are inverted from convention
- No benchmark tests anywhere
- No CI separation between unit and integration tests — "live" tests quietly depend on environment
- Mocking strategy is inconsistent across packages (some use testify mocks, some manual function variables)

---

## Priority Recommendations

**High:** Add tests for `internal/recipe/` execution, grok client error scenarios, and `runAgent()`/`runChat()`/`runQuery()`.

**Medium:** Test `config/` file-load failure paths, git error scenarios (uninitialized repo), and file permission errors.

**Low:** Standardize the Live test pattern, add benchmarks for SSE streaming, add concurrent operation tests.