# selenium_mcp An MCP (Model Context Protocol) server for browser automation via Selenium WebDriver, served over **Streamable HTTP** transport. ## Features - **Multi-session management** — run multiple independent browser sessions simultaneously - **Full browser automation** — navigate, click, type, select, hover, scroll, go back/forward - **Form filling** — fill multiple fields and submit in a single tool call - **Content extraction** — get visible text, raw HTML, or all hyperlinks from a page - **Screenshots** — viewport or full-page PNG captures returned as base64 - **JavaScript execution** — run arbitrary JS and get the return value - **Smart waits** — wait for elements to be present, visible, clickable, or gone - **Actionable errors** — every Selenium exception is mapped to a helpful suggestion ## Tools | Tool | Description | |---|---| | `selenium_navigate` | Navigate to a URL, optionally wait for an element | | `selenium_click` | Click an element | | `selenium_type` | Type text into an input/textarea | | `selenium_select` | Select a dropdown option by value, text, or index | | `selenium_find_elements` | Find elements and return their info | | `selenium_screenshot` | Take a viewport or full-page screenshot | | `selenium_get_page_content` | Extract text, HTML, or links from the page | | `selenium_execute_script` | Run JavaScript in the browser | | `selenium_wait_for` | Wait for an element condition | | `selenium_fill_form` | Fill multiple form fields, then optionally submit | | `selenium_scroll` | Scroll up/down/top/bottom | | `selenium_back` / `selenium_forward` | Browser history navigation | | `selenium_hover` | Hover over an element | | `selenium_get_attribute` | Get all attributes/properties of an element | | `selenium_list_sessions` | List active browser sessions | | `selenium_close_session` | Close a browser session | ## Requirements - Python 3.10+ - Google Chrome / Chromium - ChromeDriver (auto-managed by Selenium Manager in selenium >= 4.6) ## Installation ```bash poetry install ``` ## Running ```bash # Default: headless Chrome on port 8000 poetry run selenium-mcp # STDIO MODE: poetry run selenium-mcp-stdio # Custom port and visible browser SELENIUM_MCP_PORT=9000 SELENIUM_HEADLESS=false poetry run selenium-mcp ``` The MCP endpoint will be available at `http://localhost:8000/mcp`. ## Environment Variables | Variable | Default | Description | |---|---|---| | `SELENIUM_MCP_HOST` | `0.0.0.0` | Bind host | | `SELENIUM_MCP_PORT` | `8000` | Bind port | | `SELENIUM_HEADLESS` | `true` | Run Chrome in headless mode | | `SELENIUM_WINDOW_WIDTH` | `1920` | Browser window width | | `SELENIUM_WINDOW_HEIGHT` | `1080` | Browser window height | | `CHROME_BINARY` | *(auto)* | Path to Chrome/Chromium binary | | `CHROMEDRIVER_PATH` | *(auto)* | Path to ChromeDriver binary | | `SELENIUM_SCREENSHOT_DIR` | `/tmp/selenium_screenshots` | Screenshot storage directory | ## MCP Client Configuration ### Claude Code / Claude Desktop (via mcp-remote) ```json { "mcpServers": { "selenium": { "type": "streamable-http", "url": "http://localhost:8000/mcp" } } } ``` ### Programmatic Python client ```python import asyncio from mcp import ClientSession from mcp.client.streamable_http import streamablehttp_client async def main(): async with streamablehttp_client("http://localhost:8000/mcp") as (r, w, _): async with ClientSession(r, w) as session: await session.initialize() tools = await session.list_tools() print([t.name for t in tools.tools]) asyncio.run(main()) ``` ## Locator Strategies All element-targeting tools accept a `by` parameter: | Value | Selenium `By` | |---|---| | `css` (default) | `By.CSS_SELECTOR` | | `xpath` | `By.XPATH` | | `id` | `By.ID` | | `name` | `By.NAME` | | `tag_name` | `By.TAG_NAME` | | `class_name` | `By.CLASS_NAME` | | `link_text` | `By.LINK_TEXT` | | `partial_link_text` | `By.PARTIAL_LINK_TEXT` | ## License MIT