llm-tools/mcps/selenium_mcp/README.md
Gregory Gauthier 83ec950df7 first commit
2026-04-08 12:11:04 +01:00

127 lines
3.9 KiB
Markdown

# selenium_mcp
An MCP (Model Context Protocol) server for browser automation via Selenium WebDriver, served over **Streamable HTTP** transport.
## Features
- **Multi-session management** — run multiple independent browser sessions simultaneously
- **Full browser automation** — navigate, click, type, select, hover, scroll, go back/forward
- **Form filling** — fill multiple fields and submit in a single tool call
- **Content extraction** — get visible text, raw HTML, or all hyperlinks from a page
- **Screenshots** — viewport or full-page PNG captures returned as base64
- **JavaScript execution** — run arbitrary JS and get the return value
- **Smart waits** — wait for elements to be present, visible, clickable, or gone
- **Actionable errors** — every Selenium exception is mapped to a helpful suggestion
## Tools
| Tool | Description |
|---|---|
| `selenium_navigate` | Navigate to a URL, optionally wait for an element |
| `selenium_click` | Click an element |
| `selenium_type` | Type text into an input/textarea |
| `selenium_select` | Select a dropdown option by value, text, or index |
| `selenium_find_elements` | Find elements and return their info |
| `selenium_screenshot` | Take a viewport or full-page screenshot |
| `selenium_get_page_content` | Extract text, HTML, or links from the page |
| `selenium_execute_script` | Run JavaScript in the browser |
| `selenium_wait_for` | Wait for an element condition |
| `selenium_fill_form` | Fill multiple form fields, then optionally submit |
| `selenium_scroll` | Scroll up/down/top/bottom |
| `selenium_back` / `selenium_forward` | Browser history navigation |
| `selenium_hover` | Hover over an element |
| `selenium_get_attribute` | Get all attributes/properties of an element |
| `selenium_list_sessions` | List active browser sessions |
| `selenium_close_session` | Close a browser session |
## Requirements
- Python 3.10+
- Google Chrome / Chromium
- ChromeDriver (auto-managed by Selenium Manager in selenium >= 4.6)
## Installation
```bash
poetry install
```
## Running
```bash
# Default: headless Chrome on port 8000
poetry run selenium-mcp
# STDIO MODE:
poetry run selenium-mcp-stdio
# Custom port and visible browser
SELENIUM_MCP_PORT=9000 SELENIUM_HEADLESS=false poetry run selenium-mcp
```
The MCP endpoint will be available at `http://localhost:8000/mcp`.
## Environment Variables
| Variable | Default | Description |
|---|---|---|
| `SELENIUM_MCP_HOST` | `0.0.0.0` | Bind host |
| `SELENIUM_MCP_PORT` | `8000` | Bind port |
| `SELENIUM_HEADLESS` | `true` | Run Chrome in headless mode |
| `SELENIUM_WINDOW_WIDTH` | `1920` | Browser window width |
| `SELENIUM_WINDOW_HEIGHT` | `1080` | Browser window height |
| `CHROME_BINARY` | *(auto)* | Path to Chrome/Chromium binary |
| `CHROMEDRIVER_PATH` | *(auto)* | Path to ChromeDriver binary |
| `SELENIUM_SCREENSHOT_DIR` | `/tmp/selenium_screenshots` | Screenshot storage directory |
## MCP Client Configuration
### Claude Code / Claude Desktop (via mcp-remote)
```json
{
"mcpServers": {
"selenium": {
"type": "streamable-http",
"url": "http://localhost:8000/mcp"
}
}
}
```
### Programmatic Python client
```python
import asyncio
from mcp import ClientSession
from mcp.client.streamable_http import streamablehttp_client
async def main():
async with streamablehttp_client("http://localhost:8000/mcp") as (r, w, _):
async with ClientSession(r, w) as session:
await session.initialize()
tools = await session.list_tools()
print([t.name for t in tools.tools])
asyncio.run(main())
```
## Locator Strategies
All element-targeting tools accept a `by` parameter:
| Value | Selenium `By` |
|---|---|
| `css` (default) | `By.CSS_SELECTOR` |
| `xpath` | `By.XPATH` |
| `id` | `By.ID` |
| `name` | `By.NAME` |
| `tag_name` | `By.TAG_NAME` |
| `class_name` | `By.CLASS_NAME` |
| `link_text` | `By.LINK_TEXT` |
| `partial_link_text` | `By.PARTIAL_LINK_TEXT` |
## License
MIT