llm-tools/mcps/dicom_mcp/docs/TODO.md

96 lines
5.0 KiB
Markdown
Raw Normal View History

2026-04-08 11:11:04 +00:00
# DICOM MCP Server — TODO
Two remaining items. The original four-item enhancement plan (2026-02-11) has been completed — `dicom_query` and `dicom_search` were implemented, along with four additional tools (`dicom_dump_tree`, `dicom_compare_uids`, `dicom_verify_segmentations`, `dicom_analyze_ti`). The items below are follow-on improvements identified during testing.
---
## 1. Private Tag Exploration Tool (`dicom_private_tags`)
**Priority:** High — this is the trickiest to get right but the most valuable for cross-manufacturer QA.
### Problem
Manufacturers store critical acquisition parameters in private (vendor-specific) DICOM tags
rather than standard public tags. This causes real issues in multi-vendor QA workflows:
- **Philips TE=0 quirk:** The Erasmus Achieva 1.5T dataset shows `EchoTime = 0` for several
Dixon series (Body mDixon THRIVE, Thigh Dixon Volume). The actual multi-point Dixon echo
times are stored in Philips private tags, not the standard `(0018,0081)` EchoTime field.
This was confirmed via `dicom_query` grouped by SeriesDescription on the Erasmus dataset —
328 of 648 Thigh Dixon files and 100 of 157 mDixon THRIVE files report TE=0.
- **Manufacturer encoding differences:** Siemens stores echo times in public tags normally
(e.g. MOST series TEs of 2.3819.06 ms on Avanto_fit). Philips MOST TEs (2.37118.961 ms)
are in public tags too, but Dixon TEs are hidden in private tags. GE embeds Dixon image type
info in `ImageType` fields rather than series descriptions.
- **Nested sequences and binary blobs:** Philips private tags frequently contain nested
DICOM sequences, and some values are only interpretable if you know the specific software
version. Binary data needs special handling to avoid dumping unreadable content.
### Discussion Notes
From our initial conversation, we decided **not** to implement this tool immediately because:
1. Deciphering some private tags requires conditional logic based on the contents of certain
public tags (or other private tags). The exact rules are manufacturer-specific and need to
be rediscovered through hands-on exploration.
2. Building the wrong abstraction would be worse than no abstraction — we need to tinker with
real data first before committing to a tool design.
### Proposed Design (Single tool with three modes)
**`discover` mode** — Scan a file and list all private tag blocks with their creator strings.
Answers "what vendor modules are present?" Output: group number, creator string, tag count per block.
**`dump` mode** — Show all private tags within a specific creator block (or all private tags in a file).
For each tag: hex address, creator, VR, value. Binary values show first N bytes as hex + length.
Nested sequences show item count with optional one-level-deep recursion.
**`search` mode** — Scan across a directory looking for private tags matching a keyword in either
the creator string or the tag value. Useful for hunting down where manufacturers hide specific
parameters (e.g. "find any private tag with 'echo' in the creator or value").
### Additional Considerations
- **Creator filtering:** Filter by creator substring, e.g. `creator="Philips"` to only see Philips blocks.
- **Known tag dictionaries:** Embed a small lookup table for commonly useful private tags
(e.g. Philips `(2005,xx10)` for actual echo times). Start without this and add later.
- **Binary value display:** Show first 64 bytes as hex + total length, rather than full dumps.
### Suggested Next Steps
1. Start by exploring the Erasmus Philips data with `dicom_get_metadata` using custom hex tags
to see what private blocks exist and specifically chase down the TE=0 mystery.
2. Do the same on Siemens and GE data to understand the differences.
3. Once the patterns and conditional logic are clear, design the tool around real use cases.
---
## 2. `dicom_compare_headers` Directory Mode
**Priority:** Medium — useful for cross-series protocol checks but less urgent than private tags.
### Problem
`dicom_compare_headers` currently requires 210 explicit file paths. For cross-series protocol
validation (e.g. "are all MOST series using the same TR/FA across a study?"), you have to
manually pick representative files from each series first.
### Proposed Enhancement
Add a **directory mode** that automatically picks one representative file per series and compares
them. This would enable single-call cross-series protocol checks.
### Design Ideas
- New parameter: `directory` as an alternative to `file_paths`
- Auto-select one file per unique SeriesInstanceUID (first file encountered, or configurable)
- Reuse existing comparison logic
- Show series description in output to identify which series each column represents
- Optionally filter which series to include (by description pattern or sequence type)
---
*Last updated: 2026-02-25 — after adding 4 new tools (dicom_dump_tree, dicom_compare_uids,
dicom_verify_segmentations, dicom_analyze_ti) and smoke testing against Philips, Siemens, and
GE MOLLI/NOLLI data.*