Selenium logo

Selenium

Community
angiejones

An MCP implementation for Selenium WebDriver

Publisherangiejones
Repositorymcp-selenium
LanguageJavaScript
Forks
118
Stars
400
Available tools
14
Transport typestdio
Categories
LicenseMIT
Links
  • Connect tools to AI workflows

    Selenium exposes MCP capabilities that can be used by compatible AI clients and agents.

  • 14 available tools

    Browse the callable actions below, including names and descriptions when provided by the server.

  • Ready-to-copy setup

    Use the installation snippets to configure this server in your preferred MCP client.

  • Open source signals

    400 stars and 118 forks from the linked repository.

MCP Selenium Server

A Model Context Protocol (MCP) server for Selenium WebDriver — browser automation for AI agents.

Watch the video

Setup

Paste into your browser address bar:

goose://extension?cmd=npx&arg=-y&arg=%40angiejones%2Fmcp-selenium%40latest&id=selenium-mcp&name=Selenium%20MCP&description=automates%20browser%20interactions
bash
goose session --with-extension "npx -y @angiejones/mcp-selenium@latest"
bash
claude mcp add selenium -- npx -y @angiejones/mcp-selenium@latest
json
{
  "mcpServers": {
    "selenium": {
      "command": "npx",
      "args": ["-y", "@angiejones/mcp-selenium@latest"]
    }
  }
}

Example Usage

Tell the AI agent of your choice:

Open Chrome, go to github.com/angiejones, and take a screenshot.

The agent will call Selenium's APIs to start_browser, navigate, and take_screenshot. No manual scripting or explicit directions needed.

Supported Browsers

Chrome, Firefox, Edge, and Safari.

Safari note: Requires macOS. Run sudo safaridriver --enable once and enable "Allow Remote Automation" in Safari → Settings → Developer. No headless mode.


start_browser

Launches a browser session.

ParameterTypeRequiredDescription
browserstringYeschrome, firefox, edge, or safari
optionsobjectNo{ headless: boolean, arguments: string[] }

navigate

Navigates to a URL.

ParameterTypeRequiredDescription
urlstringYesURL to navigate to

interact

Performs a mouse action on an element.

ParameterTypeRequiredDescription
actionstringYesclick, doubleclick, rightclick, or hover
bystringYesLocator strategy: id, css, xpath, name, tag, class
valuestringYesValue for the locator strategy
timeoutnumberNoMax wait in ms (default: 10000)

send_keys

Types text into an element. Clears the field first.

ParameterTypeRequiredDescription
bystringYesLocator strategy
valuestringYesLocator value
textstringYesText to enter
timeoutnumberNoMax wait in ms (default: 10000)

get_element_text

Gets the text content of an element.

ParameterTypeRequiredDescription
bystringYesLocator strategy
valuestringYesLocator value
timeoutnumberNoMax wait in ms (default: 10000)

get_element_attribute

Gets an attribute value from an element.

ParameterTypeRequiredDescription
bystringYesLocator strategy
valuestringYesLocator value
attributestringYesAttribute name (e.g., href, value, class)
timeoutnumberNoMax wait in ms (default: 10000)

press_key

Presses a keyboard key.

ParameterTypeRequiredDescription
keystringYesKey to press (e.g., Enter, Tab, a)

upload_file

Uploads a file via a file input element.

ParameterTypeRequiredDescription
bystringYesLocator strategy
valuestringYesLocator value
filePathstringYesAbsolute path to the file
timeoutnumberNoMax wait in ms (default: 10000)

take_screenshot

Captures a screenshot of the current page.

ParameterTypeRequiredDescription
outputPathstringNoSave path. If omitted, returns base64 image data.

close_session

Closes the current browser session. No parameters.

execute_script

Executes JavaScript in the browser. Use for advanced interactions not covered by other tools (e.g., drag and drop, scrolling, reading computed styles, DOM manipulation).

ParameterTypeRequiredDescription
scriptstringYesJavaScript code to execute
argsarrayNoArguments accessible via arguments[0], etc.

window

Manages browser windows and tabs.

ParameterTypeRequiredDescription
actionstringYeslist, switch, switch_latest, or close
handlestringNoWindow handle (required for switch)

frame

Switches focus to a frame or back to the main page.

ParameterTypeRequiredDescription
actionstringYesswitch or default
bystringNoLocator strategy (for switch)
valuestringNoLocator value (for switch)
indexnumberNoFrame index, 0-based (for switch)
timeoutnumberNoMax wait in ms (default: 10000)

alert

Handles browser alert, confirm, or prompt dialogs.

ParameterTypeRequiredDescription
actionstringYesaccept, dismiss, get_text, or send_text
textstringNoText to send (required for send_text)
timeoutnumberNoMax wait in ms (default: 5000)

add_cookie

Adds a cookie. Browser must be on a page from the cookie's domain.

ParameterTypeRequiredDescription
namestringYesCookie name
valuestringYesCookie value
domainstringNoCookie domain
pathstringNoCookie path
securebooleanNoSecure flag
httpOnlybooleanNoHTTP-only flag
expirynumberNoUnix timestamp

get_cookies

Gets cookies. Returns all or a specific one by name.

ParameterTypeRequiredDescription
namestringNoCookie name. Omit for all cookies.

delete_cookie

Deletes cookies. Deletes all or a specific one by name.

ParameterTypeRequiredDescription
namestringNoCookie name. Omit to delete all.

diagnostics

Gets browser diagnostics captured via WebDriver BiDi (auto-enabled when supported).

ParameterTypeRequiredDescription
typestringYesconsole, errors, or network
clearbooleanNoClear buffer after returning (default: false)

MCP resources provide read-only data that clients can access without calling a tool.

browser-status://current

Returns the current browser session status (active session ID or "no active session").

PropertyValue
MIME typetext/plain
Requires browserNo

accessibility://current

Returns an accessibility tree snapshot of the current page — a compact, structured JSON representation of interactive elements and text content. Much smaller than full HTML. Useful for understanding page layout and finding elements to interact with.

PropertyValue
MIME typeapplication/json
Requires browserYes

Setup

bash
git clone https://github.com/angiejones/mcp-selenium.git
cd mcp-selenium
npm install

Run Tests

bash
npm test

Requires Chrome + chromedriver on PATH. Tests run headless.

Install via Smithery

bash
npx -y @smithery/cli install @angiejones/mcp-selenium --client claude

Install globally

bash
npm install -g @angiejones/mcp-selenium
mcp-selenium

License

MIT

Installation

TypingMind
Prerequisites:

Node.js 18+

{
  "mcpServers": {
    "selenium": {
      "command": "npx",
      "args": [
        "-y",
        "@angiejones/mcp-selenium"
      ]
    }
  }
}

Available Tools

  • start_browser

    launches browser

  • navigate

    navigates to a URL

  • find_element

    finds an element

  • click_element

    clicks an element

  • send_keys

    sends keys to an element, aka typing

  • get_element_text

    gets the text() of an element

  • hover

    moves the mouse to hover over an element

  • drag_and_drop

    drags an element and drops it onto another element

  • double_click

    performs a double click on an element

  • right_click

    performs a right click (context click) on an element

  • press_key

    simulates pressing a keyboard key

  • upload_file

    uploads a file using a file input element

  • take_screenshot

    captures a screenshot of the current page

  • close_session

    closes the current browser session

Use Selenium MCP with multiple AI models

TypingMind connects MCP tools at the workspace level, so once Selenium is connected, you can use it with different AI models in TypingMind instead of setting it up separately for each model. This MCP runs locally through the TypingMind MCP connector on your device.

Setup guide to use the local connector

Use this when the MCP server needs access to local files, apps, or private resources on your computer.

1

Open the MCP settings

In TypingMind, go to Settings, Advanced Settings, then Model Context Protocol and choose Setup Connector.

  1. Open TypingMind in your browser.
  2. Click the Settings icon.
  3. Go to Advanced Settings.
  4. Open the Model Context Protocol section.
  5. Click Setup Connector and choose This Device.
TypingMind MCP connector setup screen with This Device selected
2

Run the connector command

Choose This Device, copy the command from TypingMind, and run it in Terminal. Keep the process running while you use MCP.

  1. Copy the setup command shown by TypingMind.
  2. Open Terminal on macOS or Windows Terminal on Windows.
  3. Paste and run the command.
  4. Approve the package install if Terminal asks you to proceed.
  5. Keep the Terminal window running while using MCP tools.
3

Add Selenium as a server

When the connector status is Ready, click Edit Servers and paste the MCP server configuration.

  1. Wait until the connector status shows Ready.
  2. Click Edit Servers.
  3. Paste the Selenium MCP server configuration.
  4. Save the server list.
  5. Refresh if you want to confirm the connector is still ready.
TypingMind MCP settings showing active server and Edit Servers button
{
  "mcpServers": {
    "selenium": {
      "command": "npx",
      "args": [
        "-y",
        "@angiejones/mcp-selenium"
      ]
    }
  }
}
4

Use it across models

Save the server list, open Plugins, enable the Selenium MCP tools, then select any supported AI model in TypingMind and use the tools in chat or assign them to an AI agent.

  1. Open the Plugins page in TypingMind.
  2. Enable the Selenium MCP tools.
  3. Start a chat and choose the AI model you want to use.
  4. Use the MCP tools in chat or assign them to an AI agent.
  5. Switch to another AI model whenever needed without reconnecting MCP.
TypingMind chat using enabled MCP tools with a selected AI model
Can you use Selenium to help me with this task?
Selenium
Sure. I read it.
Here is what I found using Selenium.

Frequently asked questions

What is the Selenium MCP server used for?

Selenium is an MCP server that lets compatible AI clients connect to external tools and context. In TypingMind, you can add this MCP server once and make its tools available in your AI workspace.

Can I use Selenium MCP with multiple AI models in TypingMind?

Yes. TypingMind connects MCP tools at the workspace level, so you can use Selenium with different AI models such as Claude, ChatGPT, Gemini, or other models you have configured in TypingMind without setting up the MCP server separately for each model.

Why use Selenium MCP with TypingMind?

TypingMind is one of the best frontends for LLM chat because it brings multiple AI models, prompts, plugins, AI agents, API keys, and MCP tools into one workspace. With Selenium connected, you can use its MCP tools across your preferred models while keeping your chat workflow organized in TypingMind.

How do I connect Selenium MCP to TypingMind?

Selenium runs through the TypingMind local MCP connector. This is best when the MCP server needs access to local files, desktop apps, command-line tools, or private resources on your computer.

What tools does Selenium MCP provide in TypingMind?

Selenium exposes 14 MCP tools that can be enabled from the TypingMind Plugins page and used in chat or assigned to AI agents.

Do I need to share my API keys with TypingMind to use Selenium MCP?

No. TypingMind is local-first and lets you keep your model providers, API keys, prompts, and MCP configuration under your control. If Selenium requires authentication, add the required headers, OAuth settings, or local configuration for that MCP server when you create the connection.

Related MCP Servers

View all

Set up your own AI workspace now

Get notified about new features and future giveaways by subscribing to our newsletter 👇