llm-tools/mcps/selenium_mcp/README.md

# selenium_mcp

An MCP (Model Context Protocol) server for browser automation via Selenium WebDriver, served over **Streamable HTTP** transport.

## Features

- **Multi-session management** — run multiple independent browser sessions simultaneously
- **Full browser automation** — navigate, click, type, select, hover, scroll, go back/forward
- **Form filling** — fill multiple fields and submit in a single tool call
- **Content extraction** — get visible text, raw HTML, or all hyperlinks from a page
- **Screenshots** — viewport or full-page PNG captures returned as base64
- **JavaScript execution** — run arbitrary JS and get the return value
- **Smart waits** — wait for elements to be present, visible, clickable, or gone
- **Actionable errors** — every Selenium exception is mapped to a helpful suggestion

## Tools

| Tool | Description |
|---|---|
| `selenium_navigate` | Navigate to a URL, optionally wait for an element |
| `selenium_click` | Click an element |
| `selenium_type` | Type text into an input/textarea |
| `selenium_select` | Select a dropdown option by value, text, or index |
| `selenium_find_elements` | Find elements and return their info |
| `selenium_screenshot` | Take a viewport or full-page screenshot |
| `selenium_get_page_content` | Extract text, HTML, or links from the page |
| `selenium_execute_script` | Run JavaScript in the browser |
| `selenium_wait_for` | Wait for an element condition |
| `selenium_fill_form` | Fill multiple form fields, then optionally submit |
| `selenium_scroll` | Scroll up/down/top/bottom |
| `selenium_back` / `selenium_forward` | Browser history navigation |
| `selenium_hover` | Hover over an element |
| `selenium_get_attribute` | Get all attributes/properties of an element |
| `selenium_list_sessions` | List active browser sessions |
| `selenium_close_session` | Close a browser session |

## Requirements

- Python 3.10+
- Google Chrome / Chromium
- ChromeDriver (auto-managed by Selenium Manager in selenium >= 4.6)

## Installation

```bash
poetry install
```

## Running

```bash
# Default: headless Chrome on port 8000
poetry run selenium-mcp

# STDIO MODE:
poetry run selenium-mcp-stdio

# Custom port and visible browser
SELENIUM_MCP_PORT=9000 SELENIUM_HEADLESS=false poetry run selenium-mcp
```

The MCP endpoint will be available at `http://localhost:8000/mcp`.

## Environment Variables

| Variable | Default | Description |
|---|---|---|
| `SELENIUM_MCP_HOST` | `0.0.0.0` | Bind host |
| `SELENIUM_MCP_PORT` | `8000` | Bind port |
| `SELENIUM_HEADLESS` | `true` | Run Chrome in headless mode |
| `SELENIUM_WINDOW_WIDTH` | `1920` | Browser window width |
| `SELENIUM_WINDOW_HEIGHT` | `1080` | Browser window height |
| `CHROME_BINARY` | *(auto)* | Path to Chrome/Chromium binary |
| `CHROMEDRIVER_PATH` | *(auto)* | Path to ChromeDriver binary |
| `SELENIUM_SCREENSHOT_DIR` | `/tmp/selenium_screenshots` | Screenshot storage directory |

## MCP Client Configuration

### Claude Code / Claude Desktop (via mcp-remote)

```json
{
  "mcpServers": {
    "selenium": {
      "type": "streamable-http",
      "url": "http://localhost:8000/mcp"
    }
  }
}
```

### Programmatic Python client

```python
import asyncio
from mcp import ClientSession
from mcp.client.streamable_http import streamablehttp_client

async def main():
    async with streamablehttp_client("http://localhost:8000/mcp") as (r, w, _):
        async with ClientSession(r, w) as session:
            await session.initialize()
            tools = await session.list_tools()
            print([t.name for t in tools.tools])

asyncio.run(main())
```

## Locator Strategies

All element-targeting tools accept a `by` parameter:

| Value | Selenium `By` |
|---|---|
| `css` (default) | `By.CSS_SELECTOR` |
| `xpath` | `By.XPATH` |
| `id` | `By.ID` |
| `name` | `By.NAME` |
| `tag_name` | `By.TAG_NAME` |
| `class_name` | `By.CLASS_NAME` |
| `link_text` | `By.LINK_TEXT` |
| `partial_link_text` | `By.PARTIAL_LINK_TEXT` |

## License

MIT