Selenium MCP Server

npm version

npm downloads

GitHub issues

This is a server implementation that bridges the gap between MCP clients (AI assistants) and Selenium WebDriver. It exposes Selenium WebDriver's functionalities as MCP tools, allowing AI models to utilize them for tasks like:

Browser management (launching, navigating, closing browsers)
Element interaction (clicking, typing, finding elements)
Web scraping and automated testing
Advanced operations like screenshots, cookie management, and JavaScript execution

In essence, the selenium webdriver mcp setup allows AI assistants to leverage the power of Selenium Webdriver for web automation, by communicating with a dedicated Selenium MCP server via the Model Context Protocol. This facilitates tasks such as automated web interactions, testing, and data extraction, all controlled by AI.

🚀 Overview

A Model Context Protocol (MCP) server for Selenium that provides comprehensive Selenium WebDriver automation tools for AI assistants and applications. This server enables automated web browser interactions, testing, and scraping through a standardized interface.

Built with TypeScript and modern ES modules, it offers type-safe browser automation capabilities through the Model Context Protocol.

✨ Key Features

Multi-Browser Support: Chrome, Firefox, Safari, and Edge browser automation
Comprehensive Element Interaction: Click, type, hover, drag & drop, file uploads
Advanced Navigation: Forward, backward, refresh, window management
Wait Strategies: Intelligent waiting for elements and page states
Type Safety: Full TypeScript implementation with Zod validation

🤝 Integration

MCP Client Integration

Configure your MCP client to connect to the Selenium server:

Standard Configuration (applicable to Windsurf, Warp, Gemini CLI etc)

json
{
  "servers": {
    "selenium-mcp": {
      "command": "npx",
      "args": ["-y", "selenium-webdriver-mcp@latest"]
    }
  }
}

Installation in VS Code

Update your mcp.json in VS Code with below configuration

NOTE: If you're new to MCP servers, follow this link Use MCP servers in VS Code

Example 'stdio' type connection

json
{
  "servers": {
    "selenium-mcp": {
      "command": "npx",
      "args": [
        "-y",
        "selenium-webdriver-mcp@latest"
      ],
      "type": "stdio"
    }
  },
  "inputs": []
}

Example 'http' type connection

json
{
  "servers": {
    "Selenium": {
      "url": "https://smithery.ai/server/@pshivapr/selenium-mcp",
      "type": "http"
    }
  },
  "inputs": []
}

After installation, the Selenium MCP server will be available for use with your GitHub Copilot agent in VS Code.

To install the Selenium MCP server using the VS Code CLI

bash
# For VS Code
code --add-mcp '{\"name\":\"selenium-mcp\",\"command\": \"npx\",\"args\": [\"selenium-webdriver-mcp@latest\"]}'

bash
# For VS Code Insiders
vscode-insiders --add-mcp '{\"name\":\"selenium-mcp\",\"command\": \"npx\",\"args\": [\"selenium-webdriver-mcp@latest\"]}'

To install the package using either npm, or Smithery

Using npm:

bash
npm install -g selenium-webdriver-mcp@latest

Using Smithery

To install Selenium MCP for Claude Desktop automatically via

bash
npx @smithery/cli install @pshivapr/selenium-mcp --client claude

Claude Desktop Integration

Add to your Claude Desktop configuration:

json
{
  "mcpServers": {
    "selenium-mcp": {
      "command": "npx",
      "args": ["-y", "selenium-webdriver-mcp@latest"]
    }
  }
}

Screenshot

Selenium + Claude

Prompts

An example prompt to start AI Agent interaction:

Using selenium mcp tools, navigate to <https://parabank.parasoft.com/> click the 'Register' link and signup using dynamic test data and click register. Then generate selenium tests in <YOUR_FAVOURITE_PROGRAMMING_LANGUAGE> using pom, create tests using cucumber features, steps and execute the tests.

Note: For more prompts, look at examples directory of the project

🛠️ MCP Available Tools

Browser Management Tools

Tool	Description	Parameters
`browser_open`	Open a new browser session	`browser`, `options`
`browser_navigate`	Navigate to a URL	`url`
`browser_navigate_back`	Navigate back in history	None
`browser_navigate_forward`	Navigate forward in history	None
`browser_title`	Get the current page title	None
`browser_refresh`	Refresh the current page	None
`browser_get_url`	Get the current page URL	None
`browser_get_page_source`	Get the current page HTML source	None
`browser_maximize`	Maximize the browser window	None
`browser_resize`	Resize browser window	`width`, `height`
`browser_close`	Close current browser session	None

Cookie Management Tools

Tool	Description	Parameters
`browser_get_cookies`	Get all cookies from the current browser session	None
`browser_get_cookie_by_name`	Get a specific cookie by name	`cookie` (cookie name)
`browser_add_cookie_by_name`	Add a new cookie to the browser	`cookie` (cookie name), `value`
`browser_set_cookie_object`	Set a cookie object in the browser	`cookie` (cookie object as string)
`browser_delete_cookie`	Delete a specific cookie by name	`value` (cookie name to delete)
`browser_delete_cookies`	Delete all cookies from the current browser session	None

Window Management Tools

Tool	Description	Parameters
`browser_switch_to_window`	Switch to a different browser window by handle	`windowHandle`
`browser_switch_to_original_window`	Switch back to the original browser window	None
`browser_switch_to_window_by_title`	Switch to a window by its page title	`title`
`browser_switch_window_by_index`	Switch to a window by its index position	`index`
`browser_switch_to_window_by_url`	Switch to a window by its URL	`url`

Element Interaction Tools

Tool	Description	Parameters
`browser_find_element`	Find an element on the page	`by`, `value`, `timeout`
`browser_find_elements`	Find multiple elements on the page	`by`, `value`, `timeout`
`browser_click`	Click on an element	`by`, `value`, `timeout`
`browser_type`	Type text into an element	`by`, `value`, `text`, `timeout`
`browser_get_element_text`	Get text content of element	`by`, `value`, `timeout`
`browser_file_upload`	Upload file via input element	`by`, `value`, `filePath`, `timeout`
`browser_clear`	Clear text from an element	`by`, `value`, `timeout`
`browser_get_attribute`	Get element attribute value	`by`, `value`, `attribute`, `timeout`

Element State Validation Tools

Tool	Description	Parameters
`browser_element_is_displayed`	Check if an element is visible on the page	`by`, `value`, `timeout`
`browser_element_is_enabled`	Check if an element is enabled for interaction	`by`, `value`, `timeout`
`browser_element_is_selected`	Check if an element is selected (checkboxes, radio buttons)	`by`, `value`, `timeout`

Frame Management Tools

Tool	Description	Parameters
`browser_switch_to_frame`	Switch to an iframe element	`by`, `value`, `timeout`
`browser_switch_to_parent_frame`	Switch to the parent frame (from nested iframe)	None
`browser_switch_to_default_content`	Switch back to the main page content	None

Advanced Action Tools

Tool	Description	Parameters
`browser_hover`	Hover over an element	`by`, `value`, `timeout`
`browser_double_click`	Double-click on an element	`by`, `value`, `timeout`
`browser_right_click`	Right-click (context menu)	`by`, `value`, `timeout`
`browser_drag_and_drop`	Drag from source to target	`by`, `value`, `targetBy`, `targetValue`, `timeout`
`browser_wait_for_element`	Wait for element to appear	`by`, `value`, `timeout`
`browser_execute_script`	Execute JavaScript code	`script`, `args`
`browser_screenshot`	Take a screenshot	`filename` (optional)
`browser_select_dropdown_by_text`	Select dropdown option by visible text	`by`, `value`, `text`, `timeout`
`browser_select_dropdown_by_value`	Select dropdown option by value	`by`, `value`, `dropdownValue`, `timeout`
`browser_key_press`	Press a keyboard key in the browser	`key`, `timeout`

Scrolling Tools

Tool	Description	Parameters
`browser_scroll_to_element`	Scroll to bring an element into view	`by`, `value`, `timeout`
`browser_scroll_to_top`	Scroll to the top of the page	None
`browser_scroll_to_bottom`	Scroll to the bottom of the page	None
`browser_scroll_to_coordinates`	Scroll to specific coordinates	`x`, `y`
`browser_scroll_by_pixels`	Scroll by specified number of pixels	`x`, `y`

Form Interaction Tools

Tool	Description	Parameters
`browser_select_checkbox`	Select/check a checkbox	`by`, `value`, `timeout`
`browser_unselect_checkbox`	Unselect/uncheck a checkbox	`by`, `value`, `timeout`
`browser_submit_form`	Submit a form element	`by`, `value`, `timeout`
`browser_focus_element`	Focus on a specific element	`by`, `value`, `timeout`
`browser_blur_element`	Remove focus from a specific element	`by`, `value`, `timeout`

Element Locator Strategies

id: Find by element ID
css: Find by CSS selector
xpath: Find by XPath expression
name: Find by name attribute
tag: Find by HTML tag name
class: Find by CSS class name

📋 Requirements

Node.js: Version 18.0.0 or higher
Browsers: Chrome, Firefox, Safari, or Edge installed
WebDrivers: Automatically managed by selenium-webdriver
Operating System: Windows, macOS, or Linux

🚦 Development

Getting Started

Clone the repository

bash
git clone https://github.com/pshivapr/selenium-mcp.git
cd selenium-mcp

Install dependencies

bash
npm install

Build the project

bash
npm run build

Running the Server

Production Mode

bash
npm start

Development Mode (with auto-reload)

bash
npm run dev

Direct Execution

bash
node dist/index.js

Using as CLI Tool

After building, you can use the server as a global command:

bash
npx selenium-webdriver-mcp@latest

📝 License

MIT License - see LICENSE file for details.

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Fork the repository
Create your feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

Badges/Mentions

MCP Market

Pulse

Built with ❤️ for the Model Context Protocol ecosystem

Available Tools

browser_open

Open a new browser session

browser_navigate

Navigate to a URL

browser_navigate_back

Navigate back in the browser

browser_navigate_forward

Navigate forward in the browser

browser_title

Get the current page title

browser_get_url

Get the current page URL

browser_get_page_source

Get the current page source

browser_maximize

Maximize the browser window

browser_resize

Resize the browser window

browser_refresh

Refresh the current page

browser_switch_to_window

Switch to a different browser window

browser_switch_to_original_window

Switches back to the original browser window

browser_switch_to_window_by_title

Switch to a window by its title

browser_switch_window_by_index

Switch to a window by its index

browser_switch_to_window_by_url

Switch to a window by its URL

browser_close

Close the current browser session

browser_find_element

Find an element

browser_find_elements

Find multiple elements

browser_click

Perform a click on an element

browser_type

Type into an editable field

browser_clear

Clears the value of an input element

browser_get_element_text

Gets the text of an element

browser_get_attribute

Gets the value of an attribute from an element

browser_element_is_displayed

Checks if an element is displayed

browser_element_is_enabled

Checks if an element is enabled

browser_element_is_selected

Checks if an element is selected

browser_switch_to_frame

Switches to an iframe element

browser_switch_to_default_content

Switches to the default content

browser_switch_to_parent_frame

Switches to the parent iframe

browser_file_upload

Uploads a file using a file input element

browser_hover

Hover over an element

browser_wait_for_element

Wait for an element to be present

browser_drag_and_drop

Perform drag and drop between two elements

browser_double_click

Perform double click on an element

browser_right_click

Perform right click (context click) on an element

browser_select_dropdown_by_text

Select dropdown by visible text

browser_select_dropdown_by_value

Select dropdown by value

browser_key_press

Press a key on the keyboard

browser_execute_script

Execute JavaScript in the context of the current page

browser_scroll_to_element

Scroll to an element

browser_scroll_to_top

Scroll to the top of the page

browser_scroll_to_bottom

Scroll to the bottom of the page

browser_scroll_to_coordinates

Scroll to specific coordinates

browser_scroll_by_pixels

Scroll by a specific number of pixels

browser_select_checkbox

Select a checkbox

browser_unselect_checkbox

Unselect a checkbox

browser_submit_form

Submit a form

browser_focus_element

Focus on a specific element

browser_blur_element

Remove focus from a specific element

browser_screenshot

Take a screenshot of the current page

browser_get_cookies

Get all cookies

browser_get_cookie_by_name

Get a cookie by name

browser_add_cookie_by_name

Add a cookie to the browser

browser_set_cookie_object

Set a cookie in the browser

browser_delete_cookie

Delete a cookie from the browser

browser_delete_cookies

Delete cookies from the browser

Frequently asked questions

What is the Selenium WebDriver MCP server used for?

Selenium WebDriver is an MCP server that lets compatible AI clients connect to external tools and context. In TypingMind, you can add this MCP server once and make its tools available in your AI workspace.

Can I use Selenium WebDriver MCP with multiple AI models in TypingMind?

Yes. TypingMind connects MCP tools at the workspace level, so you can use Selenium WebDriver with different AI models such as Claude, ChatGPT, Gemini, or other models you have configured in TypingMind without setting up the MCP server separately for each model.

Why use Selenium WebDriver MCP with TypingMind?

TypingMind is one of the best frontends for LLM chat because it brings multiple AI models, prompts, plugins, AI agents, API keys, and MCP tools into one workspace. With Selenium WebDriver connected, you can use its MCP tools across your preferred models while keeping your chat workflow organized in TypingMind.

How do I connect Selenium WebDriver MCP to TypingMind?

Selenium WebDriver runs through the TypingMind local MCP connector. This is best when the MCP server needs access to local files, desktop apps, command-line tools, or private resources on your computer.

What tools does Selenium WebDriver MCP provide in TypingMind?

Selenium WebDriver exposes 56 MCP tools that can be enabled from the TypingMind Plugins page and used in chat or assigned to AI agents.

Do I need to share my API keys with TypingMind to use Selenium WebDriver MCP?

No. TypingMind is local-first and lets you keep your model providers, API keys, prompts, and MCP configuration under your control. If Selenium WebDriver requires authentication, add the required headers, OAuth settings, or local configuration for that MCP server when you create the connection.

Publisher	pshivapr
Repository	`selenium-mcp`
Language	TypeScript
Forks	5
Stars	6
Available tools	56
Transport type	stdio
Categories	Browser Automation Search Web
License	MIT
Links	GitHub