Browser Automation logo

Browser Automation

Community
hihuzhen

Browser MCP

Publisherhihuzhen
Repositorybrowser-mcp
LanguageJavaScript
Forks
1
Stars
2
Available tools
10
Transport typestdio
Categories
LicenseMIT
Links
  • Connect tools to AI workflows

    Browser Automation exposes MCP capabilities that can be used by compatible AI clients and agents.

  • 10 available tools

    Browse the callable actions below, including names and descriptions when provided by the server.

  • Ready-to-copy setup

    Use the installation snippets to configure this server in your preferred MCP client.

  • Open source signals

    2 stars and 1 forks from the linked repository.

Browser MCP Server 🚀

English | 简体中文

license Python Stars Forks PRs

Browser MCP Server是一个基于WebSocket通信的浏览器MCP(Model Context Protocol)服务器实现,允许AI助手控制你的浏览器。

🚀 项目特点

  • WebSocket通信: 使用WebSocket替代原有的通信方式,提供更高效的双向通信
  • Python后端: App服务端完全使用Python重写,利用FastMCP框架
  • 浏览器自动化: 允许AI助手执行各种浏览器操作
  • 本地运行: 完全在本地运行,保证用户隐私
  • 多工具支持: 支持截图、交互式操作等多种工具

📁 项目结构

├── packages/           # 项目包
│   ├── app/            # Python实现的MCP服务器
│   │   ├── src/nep_browser_engine/  # 主源码目录
│   │   ├── pyproject.toml  # Python项目配置
│   │   └── .gitignore      # Python项目的gitignore文件
│   └── extension/      # Chrome浏览器扩展
│       ├── common/     # 通用代码和常量
│       ├── entrypoints/ # 入口点(background和popup)
│       └── inject-scripts/ # 注入到网页的脚本
├── .gitignore          # 根目录gitignore文件
└── LICENSE             # 许可证文件

App部分(Python实现)

主要组件:

  • WebSocket服务: 实现WebSocket服务器,负责与浏览器扩展通信
  • MCP服务: 实现MCP协议,提供各种浏览器控制工具
  • 消息处理: 处理WebSocket消息和MCP工具调用

Extension部分(TypeScript实现)

主要组件:

  • WebSocket客户端: 负责与Python服务端通信
  • 工具处理器: 处理来自服务端的工具调用请求
  • 注入脚本: 在网页中执行各种操作

🛠️ 核心功能

页面交互

  • 元素点击: 通过CSS选择器点击页面元素
  • 表单填写: 填写表单或选择选项
  • 键盘操作: 模拟键盘输入
  • 获取页面内容: 提取页面文本和HTML
  • 获取元素: 获取页面中的特定元素
  • 交互式元素识别: 自动识别页面中的交互式元素

媒体和网络

  • 截图: 截取整个页面或特定元素

🚀 快速开始

前置要求

  • Python 3.9+ 和 pip/poetry/uv
  • Chrome/Chromium浏览器

安装步骤

1. 安装Chrome扩展

bash
cd extension
pnpm install
pnpm run build

# 或者去releases中下载指定版本

然后在Chrome浏览器中:

  1. 打开 chrome://extensions/
  2. 启用"开发者模式"
  3. 点击"加载已解压的扩展程序"

2. 运行服务

json
{
  "mcpServers": {
    "nep-browser-engine": {
      "type": "stdio",
      "command": "uvx",
      "args": ["nep-browser-engine"]
    }
  }
}

3. 连接扩展和服务

点击浏览器中的扩展图标,连接到WebSocket服务。

📝 使用说明

与MCP协议客户端一起使用

可以将本服务与支持MCP协议的AI客户端一起使用,例如Claude、CherryStudio等。

🛠️ 可用工具列表

以下是主要的可用工具:

浏览器管理

  • get_windows_and_tabs: 获取所有打开的窗口和标签页
  • browser_navigate: 导航到URL或刷新当前标签页
  • browser_close_tabs: 关闭特定标签页或窗口
  • browser_go_back_or_forward: 浏览器历史前进或后退

页面交互

  • browser_click_element: 点击页面元素
  • browser_fill_or_select: 填写表单或选择选项
  • browser_get_elements: 获取页面元素
  • browser_keyboard: 模拟键盘输入
  • browser_get_web_content: 获取网页内容
  • browser_screenshot: 截取页面截图

🔧 开发指南

Python服务端开发

  1. 确保安装了所有依赖
  2. 可以通过修改 app/src/nep_browser_engine/config.py 来配置WebSocket端口等参数
  3. 运行时可以通过参数指定传输协议: python -m nep_browser_engine.app --transport stdio

Chrome扩展开发

  1. 修改代码后运行 pnpm run build 重新构建扩展
  2. 扩展会自动重新加载(如果在开发者模式下)
  3. WebSocket默认连接地址为 ws://localhost:18765

📋 注意事项

  • 本项目仍在开发中,可能存在一些bug和不完善的地方
  • 使用前请确保理解所有工具的功能和潜在风险
  • 请勿将本项目用于任何非法或未经授权的活动

🤝 贡献

欢迎提交issue和PR来帮助改进这个项目!

鸣谢

本项目参考 hangwin/mcp-chrome

📄 许可证

MIT License

Installation

TypingMind
Prerequisites:

Node.js 18+

{
  "mcpServers": {
    "nep-browser-engine": {
      "command": "uvx",
      "args": [
        "nep-browser-engine"
      ]
    }
  }
}

Available Tools

  • browser_navigate

    Navigate to a URL or refresh the current tab.

  • get_windows_and_tabs

    Get all currently open browser windows and tabs

  • browser_go_back_or_forward

    Navigate back or forward in browser history

  • browser_click_element

    Click on an element in the current page

  • browser_fill_or_select

    Fill a form element or select an option with the specified value

  • browser_get_elements

    Get elements from the current page

  • browser_keyboard

    Simulate keyboard events in the browser'

  • browser_get_web_content

    Fetch content from a web page

  • browser_close_tabs

    Close one or more browser tabs

  • browser_screenshot

    Take a screenshot of the current page or a specific element(if you want to see the page, recommend to use chrome_get_web_content first)

Use Browser Automation MCP with multiple AI models

TypingMind connects MCP tools at the workspace level, so once Browser Automation is connected, you can use it with different AI models in TypingMind instead of setting it up separately for each model. This MCP runs locally through the TypingMind MCP connector on your device.

Setup guide to use the local connector

Use this when the MCP server needs access to local files, apps, or private resources on your computer.

1

Open the MCP settings

In TypingMind, go to Settings, Advanced Settings, then Model Context Protocol and choose Setup Connector.

  1. Open TypingMind in your browser.
  2. Click the Settings icon.
  3. Go to Advanced Settings.
  4. Open the Model Context Protocol section.
  5. Click Setup Connector and choose This Device.
TypingMind MCP connector setup screen with This Device selected
2

Run the connector command

Choose This Device, copy the command from TypingMind, and run it in Terminal. Keep the process running while you use MCP.

  1. Copy the setup command shown by TypingMind.
  2. Open Terminal on macOS or Windows Terminal on Windows.
  3. Paste and run the command.
  4. Approve the package install if Terminal asks you to proceed.
  5. Keep the Terminal window running while using MCP tools.
3

Add Browser Automation as a server

When the connector status is Ready, click Edit Servers and paste the MCP server configuration.

  1. Wait until the connector status shows Ready.
  2. Click Edit Servers.
  3. Paste the Browser Automation MCP server configuration.
  4. Save the server list.
  5. Refresh if you want to confirm the connector is still ready.
TypingMind MCP settings showing active server and Edit Servers button
{
  "mcpServers": {
    "browser-automation": {
      "command": "npx",
      "args": [
        "-y",
        "mcp-server-browser"
      ]
    }
  }
}
4

Use it across models

Save the server list, open Plugins, enable the Browser Automation MCP tools, then select any supported AI model in TypingMind and use the tools in chat or assign them to an AI agent.

  1. Open the Plugins page in TypingMind.
  2. Enable the Browser Automation MCP tools.
  3. Start a chat and choose the AI model you want to use.
  4. Use the MCP tools in chat or assign them to an AI agent.
  5. Switch to another AI model whenever needed without reconnecting MCP.
TypingMind chat using enabled MCP tools with a selected AI model
Can you use Browser Automation to help me with this task?
Browser Automation
Sure. I read it.
Here is what I found using Browser Automation.

Frequently asked questions

What is the Browser Automation MCP server used for?

Browser Automation is an MCP server that lets compatible AI clients connect to external tools and context. In TypingMind, you can add this MCP server once and make its tools available in your AI workspace.

Can I use Browser Automation MCP with multiple AI models in TypingMind?

Yes. TypingMind connects MCP tools at the workspace level, so you can use Browser Automation with different AI models such as Claude, ChatGPT, Gemini, or other models you have configured in TypingMind without setting up the MCP server separately for each model.

Why use Browser Automation MCP with TypingMind?

TypingMind is one of the best frontends for LLM chat because it brings multiple AI models, prompts, plugins, AI agents, API keys, and MCP tools into one workspace. With Browser Automation connected, you can use its MCP tools across your preferred models while keeping your chat workflow organized in TypingMind.

How do I connect Browser Automation MCP to TypingMind?

Browser Automation runs through the TypingMind local MCP connector. This is best when the MCP server needs access to local files, desktop apps, command-line tools, or private resources on your computer.

What tools does Browser Automation MCP provide in TypingMind?

Browser Automation exposes 10 MCP tools that can be enabled from the TypingMind Plugins page and used in chat or assigned to AI agents.

Do I need to share my API keys with TypingMind to use Browser Automation MCP?

No. TypingMind is local-first and lets you keep your model providers, API keys, prompts, and MCP configuration under your control. If Browser Automation requires authentication, add the required headers, OAuth settings, or local configuration for that MCP server when you create the connection.

Related MCP Servers

View all

Set up your own AI workspace now

Get notified about new features and future giveaways by subscribing to our newsletter 👇