Merge branch 'main' into fix/scroll-direction-pixels

This commit is contained in:
Simon
2026-04-02 18:31:56 +08:00
committed by GitHub
47 changed files with 978 additions and 680 deletions

View File

@@ -28,7 +28,7 @@ jobs:
run: npm run build:website run: npm run build:website
- name: Setup Pages - name: Setup Pages
uses: actions/configure-pages@v5 uses: actions/configure-pages@v6
- name: Upload artifact - name: Upload artifact
uses: actions/upload-pages-artifact@v4 uses: actions/upload-pages-artifact@v4
@@ -37,4 +37,4 @@ jobs:
- name: Deploy to GitHub Pages - name: Deploy to GitHub Pages
id: deployment id: deployment
uses: actions/deploy-pages@v4 uses: actions/deploy-pages@v5

View File

@@ -2,42 +2,13 @@
♥️ We welcome contributions from everyone. ♥️ We welcome contributions from everyone.
## 🚀 Quick Start For local development workflows, setup, local LLM config, extension development, testing on other websites, and more details, see [docs/developer-guide.md](docs/developer-guide.md).
### Development Setup
1. **Prerequisites**
- `macOS` / `Linux` / `WSL`
- `node.js >= 20` with `npm >= 10`
- An editor that supports `ts/eslint/prettier`
- Make sure `eslint`, `prettier` and `commitlint` work well. Un-linted code won't pass the CI.
2. **Setup**
```bash
npm i
npm start # Start demo and documentation site
npm run build # Build libs and website
```
### Project Structure
This is a **monorepo** with npm workspaces containing **4 main packages**:
- **Page Agent** (`packages/page-agent/`) - Main entry with built-in UI Panel, published as `page-agent` on npm
- **Core** (`packages/core/`) - Core agent logic without UI (npm: `@page-agent/core`)
- **Extension** (`packages/extension/`) - Chrome extension for multi-page tasks and browser-level automation
- **Website** (`packages/website/`) - React documentation and landing page. Also as demo and test page for the core lib. private package `@page-agent/website`
> We use a simplified monorepo solution with `native npm-workspace + ts reference + vite alias`. No fancy tooling. Hoisting is required.
>
> - When developing. Use alias so that we don't have to pre-build.
> - When bundling. Use external and disable ts `paths` alias.
> - When bundling `IIFE` and `Website`. Bundle everything together.
## 🤝 How to Contribute ## 🤝 How to Contribute
### Reporting Issues > **[Maintainer's Note](https://github.com/alibaba/page-agent/issues/349)**
### Opening Issues
- Use the GitHub issue tracker to report bugs or request features - Use the GitHub issue tracker to report bugs or request features
- Search existing issues before creating new ones - Search existing issues before creating new ones
@@ -46,147 +17,24 @@ This is a **monorepo** with npm workspaces containing **4 main packages**:
### Code Contributions ### Code Contributions
1. **Fork and Clone** 1. Follow existing code style and patterns
2. Update documentation as needed
```bash 3. Add JSDoc for public APIs
git clone https://github.com/your-username/page-agent.git 4. Build and lint everything
cd page-agent 5. Test in our demo website, and on other websites if applicable
``` 6. Include screenshots for UI changes
2. **Create Feature Branch**
```bash
git checkout -b feat/your-feature-name
```
3. **Make Changes**
- Follow existing code style and patterns
- Add tests for new functionality
- Update documentation as needed
4. **Test Your Changes**
- Build and lint everything.
- Test in our demo website
- Test it on other websites if applicable
- `@TODO: test suite`
5. **Commit and Push**
```bash
git add .
git commit -m "feat: add awesome feature"
git push origin feat/your-feature-name
```
6. **Create Pull Request**
- Provide clear description of changes
- Link related issues
- Include screenshots for UI changes
## 📝 Code Style
### General Guidelines
- Use TypeScript for type safety
- Follow existing naming conventions
- Write meaningful commit messages
- Keep functions small and focused
- Add JSDoc for public APIs
### Vibe Coding with AI ### Vibe Coding with AI
> [Vibe coding](https://en.wikipedia.org/wiki/Vibe_coding) - Vibe coding is **NOT** allowed for the core lib or the extension!!!
- Vibe coding is **RECOMMENDED** when maintaining **the demo, the website, the UI and tests**. - Vibe coding is **RECOMMENDED** when maintaining **the demo, the website, the UI and tests**.
- We have a [website/AGENTS.md](packages/website/AGENTS.md) for that. - Make sure your AI references `AGENTS.md` and `website/AGENTS.md` for better quality.
- Vibe coding is **NOT** allowed for the core lib!!!
- NEVER try to vibe coding the MV3 extension!!! It is HELL.
- Review anything AI wrote before make a commit. You are the author of anything you commit. NOT AI. - Review anything AI wrote before make a commit. You are the author of anything you commit. NOT AI.
If your AI assistant does not support [AGENTS.md](https://agents.md/). Add an alias for it:
- claude-code (`CLAUDE.md`)
```markdown
@AGENTS.md
```
- antigravity (`.agent/rules/alias.md`)
```markdown
---
trigger: always_on
---
@../../AGENTS.md
```
## 🔧 Development Workflows
### Test With Your Own LLM API
- Create a `.env` file in the repo root with your LLM API config
```env
LLM_MODEL_NAME=gpt-5.2
LLM_API_KEY=your-api-key
LLM_BASE_URL=https://api.your-llm-provider.com/v1
```
- **Ollama example** (tested on 0.15 + qwen3:14b, RTX3090 24GB):
```env
LLM_BASE_URL="http://localhost:11434/v1"
LLM_API_KEY="NA"
LLM_MODEL_NAME="qwen3:14b"
```
> @see https://alibaba.github.io/page-agent/docs/features/models#ollama for configuration
- **Restart the dev server** to load new env vars
- If not provided, the demo will use the free testing proxy by default. By using it, you agree to its [terms](https://github.com/alibaba/page-agent/blob/main/docs/terms-and-privacy.md).
### Extension Development
```bash
# make sure you ran `npm run build:libs` first
# and every time you changed the core libs
npm run dev -w @page-agent/ext
npm run zip -w @page-agent/ext
```
- Update `packages/extension/docs/extension_api.md` for API integration details
### Testing on Other Websites
- Start and serve a local `iife` script
```bash
npm run dev:demo # Serving IIFE with auto rebuild at http://localhost:5174/page-agent.demo.js
```
- Add a new bookmark
```javascript
javascript:(function(){var s=document.createElement('script');s.src=`http://localhost:5174/page-agent.demo.js?t=${Math.random()}`;s.onload=()=>console.log(%27PageAgent ready!%27);document.head.appendChild(s);})();
```
- Click the bookmark on any page to load Page-Agent
> Warning: AK in your local `.env` will be inlined in the iife script. Be very careful when you distribute the script.
### Adding Documentation
Ask an AI to help you add documentation to the `website/` package. Follow the existing style.
> Our AGENTS.md file and guardrails are designed for this purpose. But please be careful and review anything AI generated.
## 🚫 What We Don't Accept ## 🚫 What We Don't Accept
- Breaking changes and large PRs without prior discussion - Breaking changes and large PRs without prior discussion
- Heavy dependencies to core libs - Heavy dependencies to core libs
- Contributions without proper testing
- Code that doesn't follow project conventions
- Dependencies or code with licenses incompatible with MIT - Dependencies or code with licenses incompatible with MIT
- Bot or AI-generated pull requests without meaningful human involvement - Bot or AI-generated pull requests without meaningful human involvement
@@ -194,12 +42,6 @@ Ask an AI to help you add documentation to the `website/` package. Follow the ex
By contributing to this project, you agree that your contributions will be licensed under the MIT License. By contributing to this project, you agree that your contributions will be licensed under the MIT License.
> CLA is optional. ---
## 💬 Questions?
- Open a GitHub issue for technical questions
- Check existing documentation and issues first
- Be respectful and constructive in discussions
Thank you for helping make PageAgent better! 🎉 Thank you for helping make PageAgent better! 🎉

View File

@@ -25,15 +25,16 @@ The GUI Agent Living in Your Webpage. Control web interfaces with natural langua
- **📖 Text-based DOM manipulation** - **📖 Text-based DOM manipulation**
- No screenshots. No multi-modal LLMs or special permissions needed. - No screenshots. No multi-modal LLMs or special permissions needed.
- **🧠 Bring your own LLMs** - **🧠 Bring your own LLMs**
- **🎨 Pretty UI with human-in-the-loop**
- **🐙 Optional [chrome extension](https://alibaba.github.io/page-agent/docs/features/chrome-extension) for multi-page tasks.** - **🐙 Optional [chrome extension](https://alibaba.github.io/page-agent/docs/features/chrome-extension) for multi-page tasks.**
- And an [MCP Server (Beta)](https://alibaba.github.io/page-agent/docs/features/mcp-server) to control it from outside
## 💡 Use Cases ## 💡 Use Cases
- **SaaS AI Copilot** — Ship an AI copilot in your product in lines of code. No backend rewrite. - **SaaS AI Copilot** — Ship an AI copilot in your product in lines of code. No backend rewrite.
- **Smart Form Filling** — Turn 20-click workflows into one sentence. Perfect for ERP, CRM, and admin systems. - **Smart Form Filling** — Turn 20-click workflows into one sentence. Perfect for ERP, CRM, and admin systems.
- **Accessibility** — Make any web app accessible through natural language. Voice commands, screen readers, zero barrier. - **Accessibility** — Make any web app accessible through natural language. Voice commands, screen readers, zero barrier.
- **Multi-page Agent** — Extend your own agent's reach across browser tabs with the optional [chrome extension](https://alibaba.github.io/page-agent/docs/features/chrome-extension). - **Multi-page Agent** — Extend your own web agent's reach across browser tabs [chrome extension](https://alibaba.github.io/page-agent/docs/features/chrome-extension).
- **MCP** - Allow your agent clients to control your browser.
## 🚀 Quick Start ## 🚀 Quick Start
@@ -49,8 +50,8 @@ Fastest way to try PageAgent with our free Demo LLM:
| Mirrors | URL | | Mirrors | URL |
| ------- | ---------------------------------------------------------------------------------- | | ------- | ---------------------------------------------------------------------------------- |
| Global | https://cdn.jsdelivr.net/npm/page-agent@1.6.0/dist/iife/page-agent.demo.js | | Global | https://cdn.jsdelivr.net/npm/page-agent@1.7.0/dist/iife/page-agent.demo.js |
| China | https://registry.npmmirror.com/page-agent/1.6.0/files/dist/iife/page-agent.demo.js | | China | https://registry.npmmirror.com/page-agent/1.7.0/files/dist/iife/page-agent.demo.js |
### NPM Installation ### NPM Installation
@@ -75,11 +76,15 @@ For more programmatic usage, see [📖 Documentations](https://alibaba.github.io
## 🤝 Contributing ## 🤝 Contributing
We welcome contributions from the community! Follow our instructions in [CONTRIBUTING.md](CONTRIBUTING.md) for setup and guidelines. We welcome contributions from the community! See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines and [docs/developer-guide.md](docs/developer-guide.md) for local development workflows.
Please read [Code of Conduct](docs/CODE_OF_CONDUCT.md) before contributing. Please read the [maintainer's note](https://github.com/alibaba/page-agent/issues/349) on principles and current state.
Contributions generated entirely by bots or agents without substantial human involvement will not be accepted, and bot accounts may be blocked. Contributions generated entirely by **bots or AI** without substantial human involvement will **not be accepted**.
## ⚖️ License
[MIT License](LICENSE)
## 👏 Acknowledgments ## 👏 Acknowledgments
@@ -97,23 +102,18 @@ Licensed under the MIT License
We gratefully acknowledge the browser-use project and its contributors for their We gratefully acknowledge the browser-use project and its contributors for their
excellent work on web automation and DOM interaction patterns that helped make excellent work on web automation and DOM interaction patterns that helped make
this project possible. this project possible.
Third-party dependencies and their licenses can be found in the package.json
file and in the node_modules directory after installation.
``` ```
## 📄 License ## 🌟 Awesome Page Agent
[MIT License](LICENSE) Built something cool with PageAgent? Add it here! Open a PR to share your project.
> These are community projects — not maintained or endorsed by us. Use at your own discretion.
| Project | Description |
| -------- | ----------------------------------------------------------- |
| _Yours?_ | [Open a PR](https://github.com/alibaba/page-agent/pulls) 🙌 |
--- ---
**⭐ Star this repo if you find PageAgent helpful!** **⭐ Star this repo if you find PageAgent helpful!**
<a href="https://www.star-history.com/?repos=alibaba%2Fpage-agent&type=date">
<picture>
<source media="(prefers-color-scheme: dark)" srcset="https://api.star-history.com/image?repos=alibaba/page-agent&type=date&theme=dark&legend=top-left&v=7" />
<source media="(prefers-color-scheme: light)" srcset="https://api.star-history.com/image?repos=alibaba/page-agent&type=date&legend=top-left&v=7" />
<img alt="Star History Chart" src="https://api.star-history.com/image?repos=alibaba/page-agent&type=date&legend=top-left&v=7" />
</picture>
</a>

View File

@@ -20,20 +20,20 @@
## ✨ Features ## ✨ Features
- **🎯 轻松集成** - **🎯 轻松集成**
- 无需 `浏览器插件` / `Python` / `无头浏览器` - 无需 `浏览器插件` / `Python` / `无头浏览器`,纯页面内 JavaScript
- 纯页面内 JavaScript一切都在你的网页中完成。
- **📖 基于文本的 DOM 操作** - **📖 基于文本的 DOM 操作**
- 无需截图,无需多模态模型或特殊权限 - 无需截图,无需多模态模型或特殊权限
- **🧠 用你自己的 LLM** - **🧠 自备 LLM**
- **🎨 精美 UI支持人机协同** - 🐙 可选的 [Chrome 扩展](https://alibaba.github.io/page-agent/docs/features/chrome-extension),支持跨页面任务
- **🐙 可选的 [Chrome 扩展](https://alibaba.github.io/page-agent/docs/features/chrome-extension),支持跨页面任务。** - [MCP Server (Beta)](https://alibaba.github.io/page-agent/docs/features/mcp-server)
## 💡 应用场景 ## 💡 应用场景
- **SaaS AI 副驾驶** — 几行代码为你的产品加上 AI 副驾驶,无需重写后端。 - **SaaS AI Copilot** — 几行代码为你的产品加上 AI 副驾驶,无需重写后端。
- **智能表单填写** — 把 20 次点击变成一句话。ERP、CRM、管理后台的最佳拍档。 - **智能表单填写** — 把 20 次点击变成一句话。ERP、CRM、管理后台的最佳拍档。
- **无障碍增强** — 用自然语言让任何网页无障碍。语音指令、屏幕阅读器,零门槛。 - **无障碍增强** — 用自然语言让任何网页无障碍。语音指令、屏幕阅读器,零门槛。
- **跨页面 Agent** — 通过可选的 [Chrome 扩展](https://alibaba.github.io/page-agent/docs/features/chrome-extension),让你自己的 Agent 跨标签页工作。 - **跨页面 Agent** — 通过可选的 [Chrome 扩展](https://alibaba.github.io/page-agent/docs/features/chrome-extension),让你自己的 Web Agent 跨标签页工作。
- 通过 MCP 为现有 Agent 加入浏览器控制能力。
## 🚀 快速开始 ## 🚀 快速开始
@@ -49,8 +49,8 @@
| Mirrors | URL | | Mirrors | URL |
| ------- | ---------------------------------------------------------------------------------- | | ------- | ---------------------------------------------------------------------------------- |
| Global | https://cdn.jsdelivr.net/npm/page-agent@1.6.0/dist/iife/page-agent.demo.js | | Global | https://cdn.jsdelivr.net/npm/page-agent@1.7.0/dist/iife/page-agent.demo.js |
| China | https://registry.npmmirror.com/page-agent/1.6.0/files/dist/iife/page-agent.demo.js | | China | https://registry.npmmirror.com/page-agent/1.7.0/files/dist/iife/page-agent.demo.js |
### NPM 安装 ### NPM 安装
@@ -75,11 +75,13 @@ await agent.execute('点击登录按钮')
## 🤝 贡献 ## 🤝 贡献
欢迎社区贡献!请参阅 [CONTRIBUTING.md](../CONTRIBUTING.md) 了解安装与贡献指南。请在贡献前阅读[行为准则](CODE_OF_CONDUCT.md)。 欢迎社区贡献!请参阅 [CONTRIBUTING.md](../CONTRIBUTING.md) 了解安装与贡献指南。
我们不接受未经实质性人类参与、完全由 Bot 或 Agent 自动生成的代码,机器人账号可能被禁止参与互动 提交 issue 或 PR 之前,请先阅读[作者声明](https://github.com/alibaba/page-agent/issues/349)和[行为准则](CODE_OF_CONDUCT.md)
## 👏 致谢 我们不接受未经实质性人类参与、完全由 Bot 或 Agent 自动生成的代码。
## 👏 声明与致谢
本项目基于 **[`browser-use`](https://github.com/browser-use/browser-use)** 的优秀工作构建。 本项目基于 **[`browser-use`](https://github.com/browser-use/browser-use)** 的优秀工作构建。
@@ -95,12 +97,9 @@ Licensed under the MIT License
We gratefully acknowledge the browser-use project and its contributors for their We gratefully acknowledge the browser-use project and its contributors for their
excellent work on web automation and DOM interaction patterns that helped make excellent work on web automation and DOM interaction patterns that helped make
this project possible. this project possible.
Third-party dependencies and their licenses can be found in the package.json
file and in the node_modules directory after installation.
``` ```
## 📄 许可证 ## ⚖️ 许可证
[MIT License](../LICENSE) [MIT License](../LICENSE)
@@ -108,10 +107,3 @@ file and in the node_modules directory after installation.
**⭐ 如果觉得 PageAgent 有用或有趣,请给项目点个星!** **⭐ 如果觉得 PageAgent 有用或有趣,请给项目点个星!**
<a href="https://www.star-history.com/?repos=alibaba%2Fpage-agent&type=date">
<picture>
<source media="(prefers-color-scheme: dark)" srcset="https://api.star-history.com/image?repos=alibaba/page-agent&type=date&theme=dark&legend=top-left&v=7" />
<source media="(prefers-color-scheme: light)" srcset="https://api.star-history.com/image?repos=alibaba/page-agent&type=date&legend=top-left&v=7" />
<img alt="Star History Chart" src="https://api.star-history.com/image?repos=alibaba/page-agent&type=date&legend=top-left&v=7" />
</picture>
</a>

117
docs/developer-guide.md Normal file
View File

@@ -0,0 +1,117 @@
# Developer Guide
This file is for local development workflows.
For contribution rules and expectations, see [../CONTRIBUTING.md](../CONTRIBUTING.md).
## 🚀 Quick Start
### Development Setup
1. **Prerequisites**
- `macOS` / `Linux` / `WSL`
- `node.js >= 20` with `npm >= 10`
- An editor that supports `ts/eslint/prettier`
- Make sure `eslint`, `prettier` and `commitlint` work well. Un-linted code won't pass the CI.
2. **Setup**
```bash
npm i
npm start # Start demo and documentation site
npm run build # Build libs and website
```
## 📦 Project Structure
This is a **monorepo** with npm workspaces containing **4 main packages**:
- **Page Agent** (`packages/page-agent/`) - Main entry with built-in UI Panel, published as `page-agent` on npm
- **Core** (`packages/core/`) - Core agent logic without UI (npm: `@page-agent/core`)
- **Extension** (`packages/extension/`) - Chrome extension for multi-page tasks and browser-level automation
- **Website** (`packages/website/`) - React documentation and landing page. Also as demo and test page for the core lib. private package `@page-agent/website`
> We use a simplified monorepo solution with `native npm-workspace + ts reference + vite alias`. No fancy tooling. Hoisting is required.
>
> - When developing. Use alias so that we don't have to pre-build.
> - When bundling. Use external and disable ts `paths` alias.
> - When bundling `IIFE` and `Website`. Bundle everything together.
## 🤖 AGENTS.md Alias
If your AI assistant does not support [AGENTS.md](https://agents.md/). Add an alias for it:
- claude-code (`CLAUDE.md`)
```markdown
@AGENTS.md
```
- antigravity (`.agent/rules/alias.md`)
```markdown
---
trigger: always_on
---
@../../AGENTS.md
```
## 🔧 Development Workflows
### Test With Your Own LLM API
- Create a `.env` file in the repo root with your LLM API config
```env
LLM_MODEL_NAME=gpt-5.2
LLM_API_KEY=your-api-key
LLM_BASE_URL=https://api.your-llm-provider.com/v1
```
- **Ollama example** (tested on 0.15 + qwen3:14b, RTX3090 24GB):
```env
LLM_BASE_URL="http://localhost:11434/v1"
LLM_API_KEY="NA"
LLM_MODEL_NAME="qwen3:14b"
```
> @see https://alibaba.github.io/page-agent/docs/features/models#ollama for configuration
- **Restart the dev server** to load new env vars
- If not provided, the demo will use the free testing proxy by default. By using it, you agree to its [terms](./terms-and-privacy.md).
### Extension Development
```bash
# make sure you ran `npm run build:libs` first and every time you changed the core libs
npm run dev -w @page-agent/ext
npm run zip -w @page-agent/ext
```
- Update `packages/extension/docs/extension_api.md` for API integration details
### Testing on Other Websites
- Start and serve a local `iife` script
```bash
npm run dev:demo # Serving IIFE with auto rebuild at http://localhost:5174/page-agent.demo.js
```
- Add a new bookmark
```javascript
javascript:(function(){var s=document.createElement('script');s.src=`http://localhost:5174/page-agent.demo.js?t=${Math.random()}`;s.onload=()=>console.log(%27PageAgent ready!%27);document.head.appendChild(s);})();
```
- Click the bookmark on any page to load Page-Agent
> Warning: AK in your local `.env` will be inlined in the iife script. Be very careful when you distribute the script.
### Adding Documentation
Ask an AI to help you add documentation to the `website/` package. Follow the existing style.
> Our AGENTS.md file and guardrails are designed for this purpose. But please be careful and review anything AI generated.

520
package-lock.json generated

File diff suppressed because it is too large Load Diff

View File

@@ -1,7 +1,7 @@
{ {
"name": "root", "name": "root",
"private": true, "private": true,
"version": "1.6.0", "version": "1.7.0",
"type": "module", "type": "module",
"workspaces": [ "workspaces": [
"packages/page-controller", "packages/page-controller",
@@ -42,7 +42,7 @@
"@commitlint/config-conventional": "^20.5.0", "@commitlint/config-conventional": "^20.5.0",
"@eslint/js": "^9.39.2", "@eslint/js": "^9.39.2",
"@microsoft/api-extractor": "^7.57.7", "@microsoft/api-extractor": "^7.57.7",
"@tailwindcss/vite": "^4.2.1", "@tailwindcss/vite": "^4.2.2",
"@trivago/prettier-plugin-sort-imports": "^6.0.2", "@trivago/prettier-plugin-sort-imports": "^6.0.2",
"@types/node": "^25.5.0", "@types/node": "^25.5.0",
"@vitejs/plugin-react-swc": "^4.3.0", "@vitejs/plugin-react-swc": "^4.3.0",
@@ -60,7 +60,7 @@
"lint-staged": "^16.4.0", "lint-staged": "^16.4.0",
"prettier": "^3.8.0", "prettier": "^3.8.0",
"typescript": "^5.9.3", "typescript": "^5.9.3",
"typescript-eslint": "^8.57.1", "typescript-eslint": "^8.58.0",
"unplugin-dts": "^1.0.0-beta.6", "unplugin-dts": "^1.0.0-beta.6",
"vite": "^7.3.1", "vite": "^7.3.1",
"vite-plugin-css-injected-by-js": "^4.0.1", "vite-plugin-css-injected-by-js": "^4.0.1",

View File

@@ -1,7 +1,7 @@
{ {
"name": "@page-agent/core", "name": "@page-agent/core",
"private": false, "private": false,
"version": "1.6.0", "version": "1.7.0",
"type": "module", "type": "module",
"main": "./dist/esm/page-agent-core.js", "main": "./dist/esm/page-agent-core.js",
"module": "./dist/esm/page-agent-core.js", "module": "./dist/esm/page-agent-core.js",
@@ -44,8 +44,8 @@
}, },
"dependencies": { "dependencies": {
"chalk": "^5.6.2", "chalk": "^5.6.2",
"@page-agent/llms": "1.6.0", "@page-agent/llms": "1.7.0",
"@page-agent/page-controller": "1.6.0" "@page-agent/page-controller": "1.7.0"
}, },
"peerDependencies": { "peerDependencies": {
"zod": "^3.25.0 || ^4.0.0" "zod": "^3.25.0 || ^4.0.0"

View File

@@ -118,9 +118,18 @@ export interface ExecuteConfig {
model: string model: string
apiKey?: string apiKey?: string
// Global system-level instructions for the agent.
// Equivalent to AgentConfig.instructions.system.
systemInstruction?: string
// Include the initial tab where page JS starts. Default: true. // Include the initial tab where page JS starts. Default: true.
includeInitialTab?: boolean includeInitialTab?: boolean
// Control all unpinned tabs in the window instead of only the tab group.
// When enabled, agent sees and can switch to every non-pinned tab.
// Default: false. Experimental.
experimentalIncludeAllTabs?: boolean
onStatusChange?: (status: AgentStatus) => void onStatusChange?: (status: AgentStatus) => void
onActivity?: (activity: AgentActivity) => void onActivity?: (activity: AgentActivity) => void
onHistoryUpdate?: (history: HistoricalEvent[]) => void onHistoryUpdate?: (history: HistoricalEvent[]) => void
@@ -207,7 +216,11 @@ interface ExecuteConfig {
baseURL: string baseURL: string
model: string model: string
apiKey?: string apiKey?: string
systemInstruction?: string
includeInitialTab?: boolean includeInitialTab?: boolean
experimentalIncludeAllTabs?: boolean
onStatusChange?: (status: AgentStatus) => void onStatusChange?: (status: AgentStatus) => void
onActivity?: (activity: AgentActivity) => void onActivity?: (activity: AgentActivity) => void
onHistoryUpdate?: (history: HistoricalEvent[]) => void onHistoryUpdate?: (history: HistoricalEvent[]) => void

View File

@@ -1,7 +1,7 @@
{ {
"name": "@page-agent/ext", "name": "@page-agent/ext",
"private": true, "private": true,
"version": "1.6.0", "version": "1.7.0",
"type": "module", "type": "module",
"scripts": { "scripts": {
"dev": "wxt", "dev": "wxt",
@@ -16,31 +16,31 @@
"@radix-ui/react-separator": "^1.1.8", "@radix-ui/react-separator": "^1.1.8",
"@radix-ui/react-slot": "^1.2.4", "@radix-ui/react-slot": "^1.2.4",
"@radix-ui/react-switch": "^1.2.6", "@radix-ui/react-switch": "^1.2.6",
"@types/chrome": "^0.1.37", "@types/chrome": "^0.1.38",
"@types/react": "^19.2.14", "@types/react": "^19.2.14",
"@types/react-dom": "^19.2.1", "@types/react-dom": "^19.2.1",
"@wxt-dev/module-react": "^1.2.2", "@wxt-dev/module-react": "^1.2.2",
"class-variance-authority": "^0.7.1", "class-variance-authority": "^0.7.1",
"clsx": "^2.1.1", "clsx": "^2.1.1",
"idb": "^8.0.3", "idb": "^8.0.3",
"lucide-react": "^0.577.0", "lucide-react": "^1.7.0",
"motion": "^12.37.0", "motion": "^12.38.0",
"next-themes": "^0.4.6", "next-themes": "^0.4.6",
"react": "^19.2.4", "react": "^19.2.4",
"react-dom": "^19.2.4", "react-dom": "^19.2.4",
"rough-notation": "^0.5.1", "rough-notation": "^0.5.1",
"simple-icons": "^16.12.0", "simple-icons": "^16.14.0",
"sonner": "^2.0.7", "sonner": "^2.0.7",
"tailwind-merge": "^3.5.0", "tailwind-merge": "^3.5.0",
"tailwindcss": "^4.1.14", "tailwindcss": "^4.1.14",
"tw-animate-css": "^1.4.0", "tw-animate-css": "^1.4.0",
"wxt": "^0.20.19" "wxt": "^0.20.20"
}, },
"dependencies": { "dependencies": {
"@page-agent/core": "1.6.0", "@page-agent/core": "1.7.0",
"@page-agent/llms": "1.6.0", "@page-agent/llms": "1.7.0",
"@page-agent/page-controller": "1.6.0", "@page-agent/page-controller": "1.7.0",
"@page-agent/ui": "1.6.0", "@page-agent/ui": "1.7.0",
"ai-motion": "^0.4.8", "ai-motion": "^0.4.8",
"chalk": "^5.6.2" "chalk": "^5.6.2"
}, },

View File

@@ -11,13 +11,18 @@ function detectLanguage(): 'en-US' | 'zh-CN' {
return lang.startsWith('zh') ? 'zh-CN' : 'en-US' return lang.startsWith('zh') ? 'zh-CN' : 'en-US'
} }
interface MultiPageAgentConfig extends AgentConfig {
includeInitialTab?: boolean
experimentalIncludeAllTabs?: boolean
}
/** /**
* MultiPageAgent * MultiPageAgent
* - use with extension * - use with extension
* - can be used from a side panel or a content script * - can be used from a side panel or a content script
*/ */
export class MultiPageAgent extends PageAgentCore { export class MultiPageAgent extends PageAgentCore {
constructor(config: AgentConfig & { includeInitialTab?: boolean }) { constructor(config: MultiPageAgentConfig) {
// multi page controller // multi page controller
const tabsController = new TabsController() const tabsController = new TabsController()
const pageController = new RemotePageController(tabsController) const pageController = new RemotePageController(tabsController)
@@ -31,8 +36,8 @@ export class MultiPageAgent extends PageAgentCore {
`Default working language: **${targetLanguage}**` `Default working language: **${targetLanguage}**`
) )
// include initial tab for controlling
const includeInitialTab = config.includeInitialTab ?? true const includeInitialTab = config.includeInitialTab ?? true
const experimentalIncludeAllTabs = config.experimentalIncludeAllTabs ?? false
/** /**
* When the agent is in side-panel and user closed the side-panel. * When the agent is in side-panel and user closed the side-panel.
@@ -50,7 +55,7 @@ export class MultiPageAgent extends PageAgentCore {
customSystemPrompt: systemPrompt, customSystemPrompt: systemPrompt,
onBeforeTask: async (agent) => { onBeforeTask: async (agent) => {
await tabsController.init(agent.task, includeInitialTab) await tabsController.init(agent.task, { includeInitialTab, experimentalIncludeAllTabs })
heartBeatInterval = window.setInterval(() => { heartBeatInterval = window.setInterval(() => {
chrome.storage.local.set({ chrome.storage.local.set({

View File

@@ -10,9 +10,7 @@ export function handlePageControlMessage(
): true | undefined { ): true | undefined {
const PREFIX = '[RemotePageController.background]' const PREFIX = '[RemotePageController.background]'
function debug(...messages: any[]) { const debug = console.debug.bind(console, `\x1b[90m${PREFIX}\x1b[0m`)
console.debug(`\x1b[90m${PREFIX}\x1b[0m`, ...messages)
}
const { action, payload, targetTabId } = message const { action, payload, targetTabId } = message

View File

@@ -4,9 +4,7 @@ import type { TabsController } from './TabsController'
const PREFIX = '[RemotePageController]' const PREFIX = '[RemotePageController]'
function debug(...messages: any[]) { const debug = console.debug.bind(console, `\x1b[90m${PREFIX}\x1b[0m`)
console.debug(`\x1b[90m${PREFIX}\x1b[0m`, ...messages)
}
function sendMessage(message: { function sendMessage(message: {
type: 'PAGE_CONTROL' type: 'PAGE_CONTROL'

View File

@@ -5,9 +5,7 @@ import type { TabAction } from './TabsController'
const PREFIX = '[TabsController.background]' const PREFIX = '[TabsController.background]'
function debug(...messages: any[]) { const debug = console.debug.bind(console, `\x1b[90m${PREFIX}\x1b[0m`)
console.debug(`\x1b[90m${PREFIX}\x1b[0m`, ...messages)
}
export function handleTabControlMessage( export function handleTabControlMessage(
message: { type: 'TAB_CONTROL'; action: TabAction; payload: any }, message: { type: 'TAB_CONTROL'; action: TabAction; payload: any },
@@ -20,11 +18,10 @@ export function handleTabControlMessage(
case 'get_active_tab': { case 'get_active_tab': {
debug('get_active_tab') debug('get_active_tab')
chrome.tabs chrome.tabs
.query({ active: true, currentWindow: true }) .query({ active: true })
.then((tabs) => { .then((tabs) => {
const tabId = tabs.length > 0 ? tabs[0].id || null : null debug('get_active_tab: success', tabs)
debug('get_active_tab: success', tabId) sendResponse({ success: true, tab: tabs[0] })
sendResponse({ success: true, tabId })
}) })
.catch((error) => { .catch((error) => {
sendResponse({ error: error instanceof Error ? error.message : String(error) }) sendResponse({ error: error instanceof Error ? error.message : String(error) })
@@ -63,7 +60,7 @@ export function handleTabControlMessage(
case 'create_tab_group': { case 'create_tab_group': {
debug('create_tab_group', payload) debug('create_tab_group', payload)
chrome.tabs chrome.tabs
.group({ tabIds: payload.tabIds }) .group({ tabIds: payload.tabIds, createProperties: { windowId: payload.windowId } })
.then((groupId) => { .then((groupId) => {
debug('create_tab_group: success', groupId) debug('create_tab_group: success', groupId)
sendResponse({ success: true, groupId }) sendResponse({ success: true, groupId })
@@ -114,47 +111,59 @@ export function handleTabControlMessage(
return true // async response return true // async response
} }
case 'get_window_tabs': {
debug('get_window_tabs', payload)
chrome.tabs
.query({ windowId: payload.windowId })
.then((tabs) => {
sendResponse({ success: true, tabs })
})
.catch((error) => {
sendResponse({ error: error instanceof Error ? error.message : String(error) })
})
return true
}
default: default:
sendResponse({ error: `Unknown action: ${action}` }) sendResponse({ error: `Unknown action: ${action}` })
return return
} }
} }
export function setupTabChangeEvents() { const tabEventPorts = new Set<chrome.runtime.Port>()
console.log('[TabsController.background] setupTabChangeEvents')
function broadcastTabEvent(message: object) {
for (const port of tabEventPorts) {
port.postMessage(message)
}
}
/**
* Port-based tab events: agents connect via `chrome.runtime.connect({ name: 'tab-events' })`
* and receive tab change events through the port. Works for both extension pages and content scripts.
*/
export function setupTabEventsPort() {
chrome.runtime.onConnect.addListener((port) => {
if (port.name !== 'tab-events') return
debug('port connected', port.sender?.tab?.id ?? port.sender?.url)
tabEventPorts.add(port)
port.onDisconnect.addListener(() => {
debug('port disconnected')
tabEventPorts.delete(port)
})
})
chrome.tabs.onCreated.addListener((tab) => { chrome.tabs.onCreated.addListener((tab) => {
debug('onCreated', tab) broadcastTabEvent({ action: 'created', payload: { tab } })
chrome.runtime
.sendMessage({ type: 'TAB_CHANGE', action: 'created', payload: { tab } })
.catch((error) => {
debug('onCreated error:', error)
})
}) })
chrome.tabs.onRemoved.addListener((tabId, removeInfo) => { chrome.tabs.onRemoved.addListener((tabId, removeInfo) => {
debug('onRemoved', tabId, removeInfo) broadcastTabEvent({ action: 'removed', payload: { tabId, removeInfo } })
chrome.runtime
.sendMessage({
type: 'TAB_CHANGE',
action: 'removed',
payload: { tabId, removeInfo },
})
.catch((error) => {
debug('onRemoved error:', error)
})
}) })
chrome.tabs.onUpdated.addListener((tabId, changeInfo, tab) => { chrome.tabs.onUpdated.addListener((tabId, changeInfo, tab) => {
debug('onUpdated', tabId, changeInfo) broadcastTabEvent({ action: 'updated', payload: { tabId, changeInfo, tab } })
chrome.runtime
.sendMessage({
type: 'TAB_CHANGE',
action: 'updated',
payload: { tabId, changeInfo, tab },
})
.catch((error) => {
debug('onUpdated error:', error)
})
}) })
} }

View File

@@ -2,9 +2,7 @@ import { isContentScriptAllowed } from './RemotePageController'
const PREFIX = '[TabsController]' const PREFIX = '[TabsController]'
function debug(...messages: any[]) { const debug = console.debug.bind(console, `\x1b[90m${PREFIX}\x1b[0m`)
console.debug(`\x1b[90m${PREFIX}\x1b[0m`, ...messages)
}
function sendMessage(message: { function sendMessage(message: {
type: 'TAB_CONTROL' type: 'TAB_CONTROL'
@@ -22,46 +20,91 @@ function sendMessage(message: {
* - live in the agent env (extension page or content script) * - live in the agent env (extension page or content script)
* - no chrome apis. call sw for tab operations * - no chrome apis. call sw for tab operations
*/ */
export class TabsController extends EventTarget { export class TabsController {
currentTabId: number | null = null currentTabId: number | null = null
private disposed = false
private port: chrome.runtime.Port | null = null
private portRetries = 0
private windowId: number | null = null
private tabs: TabMeta[] = [] private tabs: TabMeta[] = []
private initialTabId: number | null = null private initialTabId: number | null = null
private tabGroupId: number | null = null private tabGroupId: number | null = null
private experimentalIncludeAllTabs = false
private task: string = '' private task: string = ''
async init(task: string, includeInitialTab: boolean = true) { async init(task: string, options: TabsInitOptions = {}) {
debug('init', task, includeInitialTab) const { includeInitialTab = true, experimentalIncludeAllTabs = false } = options
debug('init', task, options)
if (this.disposed) {
throw new Error('TabsController already disposed')
}
this.task = task
this.tabs = []
this.currentTabId = null this.currentTabId = null
this.disposed = false
this.port = null
this.portRetries = 0
this.windowId = null
this.tabs = []
this.tabGroupId = null this.tabGroupId = null
this.initialTabId = null this.initialTabId = null
this.experimentalIncludeAllTabs = experimentalIncludeAllTabs
this.task = task
const result = await sendMessage({ const activeTabResult = await sendMessage({
type: 'TAB_CONTROL', type: 'TAB_CONTROL',
action: 'get_active_tab', action: 'get_active_tab',
}) })
this.initialTabId = result.tabId this.initialTabId = activeTabResult.tab?.id
this.windowId = activeTabResult.tab?.windowId
if (!this.initialTabId) { if (!this.initialTabId || !this.windowId) {
throw new Error('Failed to get initial tab ID') if (activeTabResult.error) {
throw new Error(activeTabResult.error)
} else {
throw new Error('Failed to get active tab')
}
} }
if (includeInitialTab) { this.connectTabEvents()
if (experimentalIncludeAllTabs) {
const allTabs = await sendMessage({
type: 'TAB_CONTROL',
action: 'get_window_tabs',
payload: { windowId: this.windowId },
})
for (const tab of allTabs.tabs as chrome.tabs.Tab[]) {
if (tab.id && !tab.pinned && isContentScriptAllowed(tab.url)) {
this.addTab({
id: tab.id,
isInitial: tab.id === this.initialTabId,
url: tab.url,
title: tab.title,
status: tab.status,
})
}
}
if (this.tabs.find((t) => t.id === this.initialTabId)) {
this.currentTabId = this.initialTabId
await this.createTabGroup([this.initialTabId])
}
} else if (includeInitialTab) {
const info = await sendMessage({ const info = await sendMessage({
type: 'TAB_CONTROL', type: 'TAB_CONTROL',
action: 'get_tab_info', action: 'get_tab_info',
payload: { tabId: this.initialTabId }, payload: { tabId: this.initialTabId },
}) })
if (isContentScriptAllowed(info.url)) { if (isContentScriptAllowed(info.url) && !info.pinned) {
this.currentTabId = this.initialTabId this.currentTabId = this.initialTabId
this.tabs.push({ this.addTab({
id: result.tabId, id: this.initialTabId,
isInitial: true, isInitial: true,
url: info.url, url: info.url,
title: info.title, title: info.title,
@@ -73,52 +116,6 @@ export class TabsController extends EventTarget {
} }
await this.updateCurrentTabId(this.currentTabId) await this.updateCurrentTabId(this.currentTabId)
const tabChangeHandler = (message: any): void => {
if (message.type !== 'TAB_CHANGE') {
// throw new Error(`[TabsController]: Invalid message type: ${message.type}`)
return
}
if (message.action === 'created') {
const tab = message.payload.tab as chrome.tabs.Tab
if (tab.groupId === this.tabGroupId && tab.id != null) {
// Tab created in our controlled group
if (!this.tabs.find((t) => t.id === tab.id)) {
this.tabs.push({ id: tab.id, isInitial: false })
}
this.switchToTab(tab.id)
}
} else if (message.action === 'removed') {
const { tabId } = message.payload as { tabId: number }
const targetTab = this.tabs.find((t) => t.id === tabId)
if (targetTab) {
this.tabs = this.tabs.filter((t) => t.id !== tabId)
if (this.currentTabId === tabId) {
const newCurrentTab = this.tabs[this.tabs.length - 1] || null
if (newCurrentTab) {
this.switchToTab(newCurrentTab.id)
} else {
this.updateCurrentTabId(null)
}
}
}
} else if (message.action === 'updated') {
const { tabId, tab } = message.payload as { tabId: number; tab: chrome.tabs.Tab }
const targetTab = this.tabs.find((t) => t.id === tabId)
if (targetTab) {
targetTab.url = tab.url
targetTab.title = tab.title
targetTab.status = tab.status
}
}
}
chrome.runtime.onMessage.addListener(tabChangeHandler)
this.addEventListener('dispose', () => {
chrome.runtime.onMessage.removeListener(tabChangeHandler)
})
} }
async openNewTab(url: string): Promise<string> { async openNewTab(url: string): Promise<string> {
@@ -136,7 +133,7 @@ export class TabsController extends EventTarget {
const tabId = result.tabId as number const tabId = result.tabId as number
this.tabs.push({ this.addTab({
id: tabId, id: tabId,
isInitial: false, isInitial: false,
}) })
@@ -209,7 +206,7 @@ export class TabsController extends EventTarget {
const result = await sendMessage({ const result = await sendMessage({
type: 'TAB_CONTROL', type: 'TAB_CONTROL',
action: 'create_tab_group', action: 'create_tab_group',
payload: { tabIds }, payload: { tabIds, windowId: this.windowId },
}) })
if (!result?.success) { if (!result?.success) {
@@ -232,6 +229,11 @@ export class TabsController extends EventTarget {
}) })
} }
private addTab(meta: TabMeta) {
if (this.tabs.find((t) => t.id === meta.id)) return
this.tabs.push(meta)
}
async updateCurrentTabId(tabId: number | null) { async updateCurrentTabId(tabId: number | null) {
debug('updateCurrentTabId', tabId) debug('updateCurrentTabId', tabId)
@@ -288,9 +290,77 @@ export class TabsController extends EventTarget {
await waitUntil(() => tab.status === 'complete', 4_000) await waitUntil(() => tab.status === 'complete', 4_000)
} }
dispose() { /**
this.dispatchEvent(new Event('dispose')) * Connect to background SW via port to receive tab change events.
*
* @note Port is 1:1 (runtime.connect → background SW has no frames),
* so onDisconnect fires exactly once and we can safely reconnect.
* Reconnection may miss events during the gap.
* TODO: refresh this.tabs from background after reconnect to stay consistent.
*/
private connectTabEvents() {
this.port = chrome.runtime.connect({ name: 'tab-events' })
this.port.onMessage.addListener((message: any) => {
if (this.disposed) return
this.portRetries = 0
if (message.action === 'created') {
const tab = message.payload.tab as chrome.tabs.Tab
const shouldTrack = this.experimentalIncludeAllTabs || tab.groupId === this.tabGroupId
if (shouldTrack && tab.id != null) {
this.addTab({ id: tab.id, isInitial: false })
this.switchToTab(tab.id)
} }
} else if (message.action === 'removed') {
const { tabId } = message.payload as { tabId: number }
const targetTab = this.tabs.find((t) => t.id === tabId)
if (targetTab) {
this.tabs = this.tabs.filter((t) => t.id !== tabId)
if (this.currentTabId === tabId) {
const newCurrentTab = this.tabs[this.tabs.length - 1] || null
if (newCurrentTab) {
this.switchToTab(newCurrentTab.id)
} else {
this.updateCurrentTabId(null)
}
}
}
} else if (message.action === 'updated') {
const { tabId, tab } = message.payload as { tabId: number; tab: chrome.tabs.Tab }
const targetTab = this.tabs.find((t) => t.id === tabId)
if (targetTab) {
targetTab.url = tab.url
targetTab.title = tab.title
targetTab.status = tab.status
}
}
})
this.port.onDisconnect.addListener(() => {
this.port = null
if (this.disposed) return
if (this.portRetries >= 7) {
console.error(PREFIX, 'tab events port failed after 7 retries, giving up')
return
}
debug('port disconnected, reconnecting...')
this.portRetries++
this.connectTabEvents()
})
}
dispose() {
debug('dispose')
this.disposed = true
this.port?.disconnect()
this.port = null
}
}
export interface TabsInitOptions {
includeInitialTab?: boolean
experimentalIncludeAllTabs?: boolean
} }
export type TabAction = export type TabAction =
@@ -302,6 +372,7 @@ export type TabAction =
| 'add_tab_to_group' | 'add_tab_to_group'
| 'close_tab' | 'close_tab'
| 'get_tab_title' | 'get_tab_title'
| 'get_window_tabs'
interface TabMeta { interface TabMeta {
id: number id: number

View File

@@ -21,6 +21,7 @@ export interface AdvancedConfig {
maxSteps?: number maxSteps?: number
systemInstruction?: string systemInstruction?: string
experimentalLlmsTxt?: boolean experimentalLlmsTxt?: boolean
experimentalIncludeAllTabs?: boolean
disableNamedToolChoice?: boolean disableNamedToolChoice?: boolean
} }
@@ -125,6 +126,7 @@ export function useAgent(): UseAgentResult {
maxSteps, maxSteps,
systemInstruction, systemInstruction,
experimentalLlmsTxt, experimentalLlmsTxt,
experimentalIncludeAllTabs,
disableNamedToolChoice, disableNamedToolChoice,
...llmConfig ...llmConfig
}: ExtConfig) => { }: ExtConfig) => {
@@ -138,6 +140,7 @@ export function useAgent(): UseAgentResult {
maxSteps, maxSteps,
systemInstruction, systemInstruction,
experimentalLlmsTxt, experimentalLlmsTxt,
experimentalIncludeAllTabs,
disableNamedToolChoice, disableNamedToolChoice,
} }
await chrome.storage.local.set({ advancedConfig }) await chrome.storage.local.set({ advancedConfig })

View File

@@ -31,17 +31,20 @@ export function ConfigPanel({ config, onSave, onClose }: ConfigPanelProps) {
const [model, setModel] = useState(config?.model || DEMO_MODEL) const [model, setModel] = useState(config?.model || DEMO_MODEL)
const [apiKey, setApiKey] = useState(config?.apiKey) const [apiKey, setApiKey] = useState(config?.apiKey)
const [language, setLanguage] = useState<LanguagePreference>(config?.language) const [language, setLanguage] = useState<LanguagePreference>(config?.language)
const [maxSteps, setMaxSteps] = useState<number | undefined>(config?.maxSteps) const [maxSteps, setMaxSteps] = useState(config?.maxSteps)
const [systemInstruction, setSystemInstruction] = useState(config?.systemInstruction ?? '') const [systemInstruction, setSystemInstruction] = useState(config?.systemInstruction ?? '')
const [experimentalLlmsTxt, setExperimentalLlmsTxt] = useState( const [experimentalLlmsTxt, setExperimentalLlmsTxt] = useState(
config?.experimentalLlmsTxt ?? false config?.experimentalLlmsTxt ?? false
) )
const [experimentalIncludeAllTabs, setExperimentalIncludeAllTabs] = useState(
config?.experimentalIncludeAllTabs ?? false
)
const [disableNamedToolChoice, setDisableNamedToolChoice] = useState( const [disableNamedToolChoice, setDisableNamedToolChoice] = useState(
config?.disableNamedToolChoice ?? false config?.disableNamedToolChoice ?? false
) )
const [advancedOpen, setAdvancedOpen] = useState(false) const [advancedOpen, setAdvancedOpen] = useState(false)
const [saving, setSaving] = useState(false) const [saving, setSaving] = useState(false)
const [userAuthToken, setUserAuthToken] = useState<string>('') const [userAuthToken, setUserAuthToken] = useState('')
const [copied, setCopied] = useState(false) const [copied, setCopied] = useState(false)
const [showToken, setShowToken] = useState(false) const [showToken, setShowToken] = useState(false)
const [showApiKey, setShowApiKey] = useState(false) const [showApiKey, setShowApiKey] = useState(false)
@@ -54,6 +57,7 @@ export function ConfigPanel({ config, onSave, onClose }: ConfigPanelProps) {
setMaxSteps(config?.maxSteps) setMaxSteps(config?.maxSteps)
setSystemInstruction(config?.systemInstruction ?? '') setSystemInstruction(config?.systemInstruction ?? '')
setExperimentalLlmsTxt(config?.experimentalLlmsTxt ?? false) setExperimentalLlmsTxt(config?.experimentalLlmsTxt ?? false)
setExperimentalIncludeAllTabs(config?.experimentalIncludeAllTabs ?? false)
setDisableNamedToolChoice(config?.disableNamedToolChoice ?? false) setDisableNamedToolChoice(config?.disableNamedToolChoice ?? false)
}, [config]) }, [config])
@@ -100,6 +104,7 @@ export function ConfigPanel({ config, onSave, onClose }: ConfigPanelProps) {
maxSteps: maxSteps || undefined, maxSteps: maxSteps || undefined,
systemInstruction: systemInstruction || undefined, systemInstruction: systemInstruction || undefined,
experimentalLlmsTxt, experimentalLlmsTxt,
experimentalIncludeAllTabs,
disableNamedToolChoice, disableNamedToolChoice,
}) })
} finally { } finally {
@@ -285,6 +290,14 @@ export function ConfigPanel({ config, onSave, onClose }: ConfigPanelProps) {
<span className="text-xs text-muted-foreground">Experimental llms.txt support</span> <span className="text-xs text-muted-foreground">Experimental llms.txt support</span>
<Switch checked={experimentalLlmsTxt} onCheckedChange={setExperimentalLlmsTxt} /> <Switch checked={experimentalLlmsTxt} onCheckedChange={setExperimentalLlmsTxt} />
</label> </label>
<label className="flex items-center justify-between cursor-pointer">
<span className="text-xs text-muted-foreground">Experimental include all tabs</span>
<Switch
checked={experimentalIncludeAllTabs}
onCheckedChange={setExperimentalIncludeAllTabs}
/>
</label>
</> </>
)} )}

View File

@@ -111,6 +111,7 @@ export function EmptyState() {
]} ]}
cursorStyle="underscore" cursorStyle="underscore"
loop loop
startOnView={false}
typeSpeed={20} typeSpeed={20}
deleteSpeed={10} deleteSpeed={10}
pauseDelay={3000} pauseDelay={3000}

View File

@@ -1,12 +1,12 @@
import { handlePageControlMessage } from '@/agent/RemotePageController.background' import { handlePageControlMessage } from '@/agent/RemotePageController.background'
import { handleTabControlMessage, setupTabChangeEvents } from '@/agent/TabsController.background' import { handleTabControlMessage, setupTabEventsPort } from '@/agent/TabsController.background'
export default defineBackground(() => { export default defineBackground(() => {
console.log('[Background] Service Worker started') console.log('[Background] Service Worker started')
// tab change events // tab change events
setupTabChangeEvents() setupTabEventsPort()
// generate user auth token // generate user auth token

View File

@@ -70,11 +70,15 @@ async function exposeAgentToPage() {
try { try {
const { task, config } = payload const { task, config } = payload
const { systemInstruction, ...agentConfig } = config
// Dispose old instance before creating new one // Dispose old instance before creating new one
multiPageAgent?.dispose() multiPageAgent?.dispose()
multiPageAgent = new MultiPageAgent(config) multiPageAgent = new MultiPageAgent({
...agentConfig,
instructions: systemInstruction ? { system: systemInstruction } : undefined,
})
// events // events

View File

@@ -7,12 +7,21 @@ export interface ExecuteConfig {
model: string model: string
apiKey?: string apiKey?: string
/**
* Global system-level instructions for the agent.
* Equivalent to `AgentConfig.instructions.system`.
*/
systemInstruction?: string
/** /**
* Whether to include the initial tab (that holds this main world script) in the task. * Whether to include the initial tab (that holds this main world script) in the task.
* @default true * @default true
*/ */
includeInitialTab?: boolean includeInitialTab?: boolean
/** Control all unpinned tabs in the window instead of only the tab group. */
experimentalIncludeAllTabs?: boolean
onStatusChange?: (status: AgentStatus) => void onStatusChange?: (status: AgentStatus) => void
onActivity?: (activity: AgentActivity) => void onActivity?: (activity: AgentActivity) => void
onHistoryUpdate?: (history: HistoricalEvent[]) => void onHistoryUpdate?: (history: HistoricalEvent[]) => void
@@ -86,7 +95,9 @@ export default defineUnlistedScript(() => {
baseURL: config.baseURL, baseURL: config.baseURL,
model: config.model, model: config.model,
apiKey: config.apiKey, apiKey: config.apiKey,
systemInstruction: config.systemInstruction,
includeInitialTab: config.includeInitialTab, includeInitialTab: config.includeInitialTab,
experimentalIncludeAllTabs: config.experimentalIncludeAllTabs,
}, },
}, },
}, },

View File

@@ -1,6 +1,6 @@
{ {
"name": "@page-agent/llms", "name": "@page-agent/llms",
"version": "1.6.0", "version": "1.7.0",
"type": "module", "type": "module",
"main": "./dist/lib/page-agent-llms.js", "main": "./dist/lib/page-agent-llms.js",
"module": "./dist/lib/page-agent-llms.js", "module": "./dist/lib/page-agent-llms.js",

View File

@@ -1,7 +1,7 @@
{ {
"name": "@page-agent/mcp", "name": "@page-agent/mcp",
"private": false, "private": false,
"version": "1.6.0", "version": "1.7.0",
"type": "module", "type": "module",
"bin": { "bin": {
"page-agent-mcp": "src/index.js" "page-agent-mcp": "src/index.js"
@@ -28,8 +28,8 @@
"node": ">=20" "node": ">=20"
}, },
"dependencies": { "dependencies": {
"@modelcontextprotocol/sdk": "^1.27.1", "@modelcontextprotocol/sdk": "^1.29.0",
"ws": "^8.19.0", "ws": "^8.20.0",
"zod": "^4.3.5" "zod": "^4.3.5"
} }
} }

View File

@@ -35,11 +35,14 @@ const mcpServer = new McpServer({ name: 'page-agent', version: '1.5.8' })
mcpServer.registerTool( mcpServer.registerTool(
'execute_task', 'execute_task',
{ {
description: description: "Execute a task in user's browser.",
'Execute a browser automation task described in natural language. ' + inputSchema: {
'The Page Agent extension will control the browser to complete the task. ' + task: z
'Blocks until the task is complete.', .string()
inputSchema: { task: z.string().describe('Task description in natural language') }, .describe(
'Task description. Give specific instructions for the task. Steps preferable. And the information you want to get after the task is done.'
),
},
}, },
async ({ task }) => { async ({ task }) => {
try { try {
@@ -50,7 +53,7 @@ mcpServer.registerTool(
{ {
type: 'text', type: 'text',
text: result.success text: result.success
? `Task completed successfully.\n\n${result.data}` ? `Task completed.\n\n${result.data}`
: `Task failed.\n\n${result.data}`, : `Task failed.\n\n${result.data}`,
}, },
], ],
@@ -67,7 +70,7 @@ mcpServer.registerTool(
mcpServer.registerTool( mcpServer.registerTool(
'get_status', 'get_status',
{ {
description: 'Check the current status of the Page Agent hub connection and agent.', description: 'Check the current status of the Page Agent hub.',
}, },
async () => ({ async () => ({
content: [ content: [

View File

@@ -1,7 +1,7 @@
{ {
"name": "page-agent", "name": "page-agent",
"private": false, "private": false,
"version": "1.6.0", "version": "1.7.0",
"type": "module", "type": "module",
"main": "./dist/esm/page-agent.js", "main": "./dist/esm/page-agent.js",
"module": "./dist/esm/page-agent.js", "module": "./dist/esm/page-agent.js",
@@ -44,10 +44,10 @@
"postpublish": "node -e \"['README.md','LICENSE'].forEach(f=>{try{require('fs').unlinkSync(f)}catch{}})\"" "postpublish": "node -e \"['README.md','LICENSE'].forEach(f=>{try{require('fs').unlinkSync(f)}catch{}})\""
}, },
"dependencies": { "dependencies": {
"@page-agent/core": "1.6.0", "@page-agent/core": "1.7.0",
"@page-agent/llms": "1.6.0", "@page-agent/llms": "1.7.0",
"@page-agent/page-controller": "1.6.0", "@page-agent/page-controller": "1.7.0",
"@page-agent/ui": "1.6.0", "@page-agent/ui": "1.7.0",
"chalk": "^5.6.2" "chalk": "^5.6.2"
}, },
"peerDependencies": { "peerDependencies": {

View File

@@ -4,11 +4,11 @@
*/ */
import { type AgentConfig, PageAgentCore } from '@page-agent/core' import { type AgentConfig, PageAgentCore } from '@page-agent/core'
import { PageController, type PageControllerConfig } from '@page-agent/page-controller' import { PageController, type PageControllerConfig } from '@page-agent/page-controller'
import { Panel } from '@page-agent/ui' import { Panel, type PanelConfig } from '@page-agent/ui'
export * from '@page-agent/core' export * from '@page-agent/core'
export type PageAgentConfig = AgentConfig & PageControllerConfig export type PageAgentConfig = AgentConfig & PageControllerConfig & Omit<PanelConfig, 'language'>
export class PageAgent extends PageAgentCore { export class PageAgent extends PageAgentCore {
panel: Panel panel: Panel
@@ -23,6 +23,7 @@ export class PageAgent extends PageAgentCore {
this.panel = new Panel(this, { this.panel = new Panel(this, {
language: config.language, language: config.language,
promptForNextTask: config.promptForNextTask,
}) })
} }
} }

View File

@@ -17,9 +17,10 @@ const DEMO_MODEL = 'qwen3.5-plus'
const DEMO_BASE_URL = 'https://page-ag-testing-ohftxirgbn.cn-shanghai.fcapp.run' const DEMO_BASE_URL = 'https://page-ag-testing-ohftxirgbn.cn-shanghai.fcapp.run'
const DEMO_API_KEY = 'NA' const DEMO_API_KEY = 'NA'
const currentScript = document.currentScript as HTMLScriptElement | null
// in case document.x is not ready yet // in case document.x is not ready yet
setTimeout(() => { setTimeout(() => {
const currentScript = document.currentScript as HTMLScriptElement | null
let config: PageAgentConfig let config: PageAgentConfig
if (currentScript) { if (currentScript) {

View File

@@ -1,6 +1,6 @@
{ {
"name": "@page-agent/page-controller", "name": "@page-agent/page-controller",
"version": "1.6.0", "version": "1.7.0",
"type": "module", "type": "module",
"main": "./dist/lib/page-controller.js", "main": "./dist/lib/page-controller.js",
"module": "./dist/lib/page-controller.js", "module": "./dist/lib/page-controller.js",

View File

@@ -218,6 +218,7 @@ export class PageController extends EventTarget {
* Clean up all element highlights * Clean up all element highlights
*/ */
async cleanUpHighlights(): Promise<void> { async cleanUpHighlights(): Promise<void> {
console.log('[PageController] cleanUpHighlights')
dom.cleanUpHighlights() dom.cleanUpHighlights()
} }
@@ -424,3 +425,5 @@ export class PageController extends EventTarget {
this.mask = null this.mask = null
} }
} }
export * from './actions'

View File

@@ -4,6 +4,9 @@
*/ */
import type { InteractiveElementDomNode } from './dom/dom_tree/type' import type { InteractiveElementDomNode } from './dom/dom_tree/type'
import { import {
clickPointer,
disablePassThrough,
enablePassThrough,
getNativeValueSetter, getNativeValueSetter,
isHTMLElement, isHTMLElement,
isInputElement, isInputElement,
@@ -15,6 +18,7 @@ import {
/** /**
* Get the HTMLElement by index from a selectorMap. * Get the HTMLElement by index from a selectorMap.
* @private Internal method, subject to change at any time.
*/ */
export function getElementByIndex( export function getElementByIndex(
selectorMap: Map<number, InteractiveElementDomNode>, selectorMap: Map<number, InteractiveElementDomNode>,
@@ -41,19 +45,21 @@ let lastClickedElement: HTMLElement | null = null
function blurLastClickedElement() { function blurLastClickedElement() {
if (lastClickedElement) { if (lastClickedElement) {
lastClickedElement.dispatchEvent(new PointerEvent('pointerout', { bubbles: true }))
lastClickedElement.dispatchEvent(new PointerEvent('pointerleave', { bubbles: false }))
lastClickedElement.dispatchEvent(new MouseEvent('mouseout', { bubbles: true }))
lastClickedElement.dispatchEvent(new MouseEvent('mouseleave', { bubbles: false }))
lastClickedElement.blur() lastClickedElement.blur()
lastClickedElement.dispatchEvent(
new MouseEvent('mouseout', { bubbles: true, cancelable: true })
)
lastClickedElement.dispatchEvent(
new MouseEvent('mouseleave', { bubbles: false, cancelable: true })
)
lastClickedElement = null lastClickedElement = null
} }
} }
/** /**
* Simulate a click on the element * Simulate a full click following W3C Pointer Events + UI Events spec order:
* pointerover/enter → mouseover/enter → pointerdown → mousedown → [focus] →
* pointerup → mouseup → click
*
* @private Internal method, subject to change at any time.
*/ */
export async function clickElement(element: HTMLElement) { export async function clickElement(element: HTMLElement) {
blurLastClickedElement() blurLastClickedElement()
@@ -61,34 +67,67 @@ export async function clickElement(element: HTMLElement) {
lastClickedElement = element lastClickedElement = element
await scrollIntoViewIfNeeded(element) await scrollIntoViewIfNeeded(element)
// Scroll the iframe element itself into view if needed
const frame = element.ownerDocument.defaultView?.frameElement const frame = element.ownerDocument.defaultView?.frameElement
if (frame) await scrollIntoViewIfNeeded(frame) if (frame) await scrollIntoViewIfNeeded(frame)
await movePointerToElement(element) const rect = element.getBoundingClientRect()
window.dispatchEvent(new CustomEvent('PageAgent::ClickPointer')) const x = rect.left + rect.width / 2
const y = rect.top + rect.height / 2
await movePointerToElement(element, x, y)
await clickPointer()
await waitFor(0.1) await waitFor(0.1)
// hover it // Hit-test to find the deepest element at click coordinates, matching
element.dispatchEvent(new MouseEvent('mouseenter', { bubbles: true, cancelable: true })) // real browser behavior where events target the innermost element.
element.dispatchEvent(new MouseEvent('mouseover', { bubbles: true, cancelable: true })) // @note This may hit a element in the blacklist
// TODO: This is a temporary workaround. Should have been handled during dom extraction.
const doc = element.ownerDocument
await enablePassThrough()
const hitTarget = doc.elementFromPoint(x, y)
await disablePassThrough()
const target =
hitTarget instanceof HTMLElement && element.contains(hitTarget) ? hitTarget : element
// dispatch a sequence of events to ensure all listeners are triggered const pointerOpts = {
element.dispatchEvent(new MouseEvent('mousedown', { bubbles: true, cancelable: true })) bubbles: true,
cancelable: true,
clientX: x,
clientY: y,
pointerType: 'mouse',
}
const mouseOpts = { bubbles: true, cancelable: true, clientX: x, clientY: y, button: 0 }
// focus it to ensure it gets the click event // Hover — pointer events first, then mouse events (spec order)
element.focus() target.dispatchEvent(new PointerEvent('pointerover', pointerOpts))
target.dispatchEvent(new PointerEvent('pointerenter', { ...pointerOpts, bubbles: false }))
target.dispatchEvent(new MouseEvent('mouseover', mouseOpts))
target.dispatchEvent(new MouseEvent('mouseenter', { ...mouseOpts, bubbles: false }))
element.dispatchEvent(new MouseEvent('mouseup', { bubbles: true, cancelable: true })) // Press
element.dispatchEvent(new MouseEvent('click', { bubbles: true, cancelable: true })) target.dispatchEvent(new PointerEvent('pointerdown', pointerOpts))
target.dispatchEvent(new MouseEvent('mousedown', mouseOpts))
// dispatch a click event // Focus is not part of the standard pointer/mouse event sequence
// element.click() // "undefined and varies between user agents".
// We focus the original element (nearest focusable ancestor), not the hit-test target, matching browser behavior.
element.focus({ preventScroll: true })
await waitFor(0.2) // Wait to ensure click event processing completes // Release
target.dispatchEvent(new PointerEvent('pointerup', pointerOpts))
target.dispatchEvent(new MouseEvent('mouseup', mouseOpts))
// Click — activation behavior (navigation, form submit, etc.) triggers
// via bubbling from target up to the interactive ancestor.
target.click()
await waitFor(0.2)
} }
/**
* @private Internal method, subject to change at any time.
*/
export async function inputTextElement(element: HTMLElement, text: string) { export async function inputTextElement(element: HTMLElement, text: string) {
const isContentEditable = element.isContentEditable const isContentEditable = element.isContentEditable
if (!isInputElement(element) && !isTextAreaElement(element) && !isContentEditable) { if (!isInputElement(element) && !isTextAreaElement(element) && !isContentEditable) {
@@ -196,6 +235,7 @@ export async function inputTextElement(element: HTMLElement, text: string) {
/** /**
* @todo browser-use version is very complex and supports menu tags, need to follow up * @todo browser-use version is very complex and supports menu tags, need to follow up
* @private Internal method, subject to change at any time.
*/ */
export async function selectOptionElement(selectElement: HTMLSelectElement, optionText: string) { export async function selectOptionElement(selectElement: HTMLSelectElement, optionText: string) {
if (!isSelectElement(selectElement)) { if (!isSelectElement(selectElement)) {
@@ -219,6 +259,9 @@ interface ScrollableElement extends Element {
scrollIntoViewIfNeeded?: (centerIfNeeded?: boolean) => void scrollIntoViewIfNeeded?: (centerIfNeeded?: boolean) => void
} }
/**
* @private Internal method, subject to change at any time.
*/
export async function scrollIntoViewIfNeeded(element: Element) { export async function scrollIntoViewIfNeeded(element: Element) {
const el = element as ScrollableElement const el = element as ScrollableElement
if (typeof el.scrollIntoViewIfNeeded === 'function') { if (typeof el.scrollIntoViewIfNeeded === 'function') {

View File

@@ -18,6 +18,7 @@
* @edit improve `sampleRect`, filter out rects with 0 area * @edit improve `sampleRect`, filter out rects with 0 area
* @edit exclude aria-hidden elements * @edit exclude aria-hidden elements
* @edit make sure attributes exist for interactive candidates. * @edit make sure attributes exist for interactive candidates.
* @edit fix "aria-*" attributes check
*/ */
export default ( export default (
@@ -1143,6 +1144,31 @@ export default (
* @param {HTMLElement} element - The element to check. * @param {HTMLElement} element - The element to check.
* @returns {boolean} Whether the element is an interactive candidate. * @returns {boolean} Whether the element is an interactive candidate.
*/ */
// @edit fix "aria-*" attributes check
const INTERACTIVE_ARIA_ATTRS = [
'aria-expanded',
'aria-checked',
'aria-selected',
'aria-pressed',
'aria-haspopup',
'aria-controls',
'aria-owns',
'aria-activedescendant',
'aria-valuenow',
'aria-valuetext',
'aria-valuemax',
'aria-valuemin',
'aria-autocomplete',
]
function hasInteractiveAria(el) {
for (let i = 0; i < INTERACTIVE_ARIA_ATTRS.length; i++) {
if (el.hasAttribute(INTERACTIVE_ARIA_ATTRS[i])) return true
}
return false
}
function isInteractiveCandidate(element) { function isInteractiveCandidate(element) {
if (!element || element.nodeType !== Node.ELEMENT_NODE) return false if (!element || element.nodeType !== Node.ELEMENT_NODE) return false
@@ -1167,7 +1193,7 @@ export default (
element.hasAttribute('onclick') || element.hasAttribute('onclick') ||
element.hasAttribute('role') || element.hasAttribute('role') ||
element.hasAttribute('tabindex') || element.hasAttribute('tabindex') ||
element.hasAttribute('aria-') || hasInteractiveAria(element) ||
element.hasAttribute('data-action') || element.hasAttribute('data-action') ||
element.getAttribute('contenteditable') === 'true' element.getAttribute('contenteditable') === 'true'

View File

@@ -5,7 +5,7 @@ import { isPageDark } from './checkDarkMode'
import styles from './SimulatorMask.module.css' import styles from './SimulatorMask.module.css'
import cursorStyles from './cursor.module.css' import cursorStyles from './cursor.module.css'
export class SimulatorMask { export class SimulatorMask extends EventTarget {
shown: boolean = false shown: boolean = false
wrapper = document.createElement('div') wrapper = document.createElement('div')
motion: Motion | null = null motion: Motion | null = null
@@ -19,6 +19,8 @@ export class SimulatorMask {
#targetCursorY = 0 #targetCursorY = 0
constructor() { constructor() {
super()
this.wrapper.id = 'page-agent-runtime_simulator-mask' this.wrapper.id = 'page-agent-runtime_simulator-mask'
this.wrapper.className = styles.wrapper this.wrapper.className = styles.wrapper
this.wrapper.setAttribute('data-browser-use-ignore', 'true') this.wrapper.setAttribute('data-browser-use-ignore', 'true')
@@ -74,13 +76,34 @@ export class SimulatorMask {
this.#moveCursorToTarget() this.#moveCursorToTarget()
window.addEventListener('PageAgent::MovePointerTo', (event: Event) => { // global events
// @note Mask should be isolated from the rest of the code.
// Global events are easier to manage and cleanup.
const movePointerToListener = (event: Event) => {
const { x, y } = (event as CustomEvent).detail const { x, y } = (event as CustomEvent).detail
this.setCursorPosition(x, y) this.setCursorPosition(x, y)
}) }
const clickPointerListener = () => {
window.addEventListener('PageAgent::ClickPointer', (event: Event) => {
this.triggerClickAnimation() this.triggerClickAnimation()
}
const enablePassThroughListener = () => {
this.wrapper.style.pointerEvents = 'none'
}
const disablePassThroughListener = () => {
this.wrapper.style.pointerEvents = 'auto'
}
window.addEventListener('PageAgent::MovePointerTo', movePointerToListener)
window.addEventListener('PageAgent::ClickPointer', clickPointerListener)
window.addEventListener('PageAgent::EnablePassThrough', enablePassThroughListener)
window.addEventListener('PageAgent::DisablePassThrough', disablePassThroughListener)
this.addEventListener('dispose', () => {
window.removeEventListener('PageAgent::MovePointerTo', movePointerToListener)
window.removeEventListener('PageAgent::ClickPointer', clickPointerListener)
window.removeEventListener('PageAgent::EnablePassThrough', enablePassThroughListener)
window.removeEventListener('PageAgent::DisablePassThrough', disablePassThroughListener)
}) })
} }
@@ -177,7 +200,9 @@ export class SimulatorMask {
} }
dispose() { dispose() {
console.log('dispose SimulatorMask')
this.motion?.dispose() this.motion?.dispose()
this.wrapper.remove() this.wrapper.remove()
this.dispatchEvent(new Event('dispose'))
} }
} }

View File

@@ -48,15 +48,33 @@ export async function waitFor(seconds: number): Promise<void> {
await new Promise((resolve) => setTimeout(resolve, seconds * 1000)) await new Promise((resolve) => setTimeout(resolve, seconds * 1000))
} }
// ======= dom utils ======= // ======= mask events =======
export async function movePointerToElement(element: HTMLElement) { /**
const rect = element.getBoundingClientRect() * Move the visual pointer to a position within an element.
* @param x - x coordinate in the element's document viewport
* @param y - y coordinate in the element's document viewport
*/
export async function movePointerToElement(element: HTMLElement, x: number, y: number) {
const offset = getIframeOffset(element) const offset = getIframeOffset(element)
const x = rect.left + rect.width / 2 + offset.x
const y = rect.top + rect.height / 2 + offset.y
window.dispatchEvent(new CustomEvent('PageAgent::MovePointerTo', { detail: { x, y } })) window.dispatchEvent(
new CustomEvent('PageAgent::MovePointerTo', {
detail: { x: x + offset.x, y: y + offset.y },
})
)
await waitFor(0.3) await waitFor(0.3)
} }
export async function clickPointer() {
window.dispatchEvent(new CustomEvent('PageAgent::ClickPointer'))
}
export async function enablePassThrough() {
window.dispatchEvent(new CustomEvent('PageAgent::EnablePassThrough'))
}
export async function disablePassThrough() {
window.dispatchEvent(new CustomEvent('PageAgent::DisablePassThrough'))
}

View File

@@ -1,6 +1,6 @@
{ {
"name": "@page-agent/ui", "name": "@page-agent/ui",
"version": "1.6.0", "version": "1.7.0",
"type": "module", "type": "module",
"main": "./dist/lib/page-agent-ui.js", "main": "./dist/lib/page-agent-ui.js",
"module": "./dist/lib/page-agent-ui.js", "module": "./dist/lib/page-agent-ui.js",

View File

@@ -369,6 +369,7 @@ export class Panel {
} }
#createWrapper(): HTMLElement { #createWrapper(): HTMLElement {
const taskInputMaxLength = 1000
const wrapper = document.createElement('div') const wrapper = document.createElement('div')
wrapper.id = 'page-agent-runtime_agent-panel' wrapper.id = 'page-agent-runtime_agent-panel'
wrapper.className = styles.wrapper wrapper.className = styles.wrapper
@@ -406,7 +407,7 @@ export class Panel {
<input <input
type="text" type="text"
class="${styles.taskInput}" class="${styles.taskInput}"
maxlength="200" maxlength="${taskInputMaxLength}"
/> />
</div> </div>
</div> </div>

View File

@@ -1,7 +1,7 @@
{ {
"name": "@page-agent/website", "name": "@page-agent/website",
"private": true, "private": true,
"version": "1.6.0", "version": "1.7.0",
"type": "module", "type": "module",
"scripts": { "scripts": {
"dev": "vite --host 0.0.0.0", "dev": "vite --host 0.0.0.0",
@@ -19,13 +19,13 @@
"@types/react-dom": "^19.2.1", "@types/react-dom": "^19.2.1",
"class-variance-authority": "^0.7.1", "class-variance-authority": "^0.7.1",
"clsx": "^2.1.1", "clsx": "^2.1.1",
"lucide-react": "^0.577.0", "lucide-react": "^1.7.0",
"motion": "^12.37.0", "motion": "^12.38.0",
"next-themes": "^0.4.6", "next-themes": "^0.4.6",
"react": "^19.2.4", "react": "^19.2.4",
"react-dom": "^19.2.4", "react-dom": "^19.2.4",
"rough-notation": "^0.5.1", "rough-notation": "^0.5.1",
"simple-icons": "^16.12.0", "simple-icons": "^16.14.0",
"sonner": "^2.0.7", "sonner": "^2.0.7",
"tailwind-merge": "^3.5.0", "tailwind-merge": "^3.5.0",
"tailwindcss": "^4.1.14", "tailwindcss": "^4.1.14",

View File

@@ -8,8 +8,8 @@ export default function LanguageSwitcher() {
const dropdownRef = useRef<HTMLDivElement>(null) const dropdownRef = useRef<HTMLDivElement>(null)
const languages = [ const languages = [
{ code: 'zh-CN' as const, label: '中文' },
{ code: 'en-US' as const, label: 'English' }, { code: 'en-US' as const, label: 'English' },
{ code: 'zh-CN' as const, label: '中文' },
] ]
const currentLanguage = languages.find((lang) => lang.code === language) || languages[0] const currentLanguage = languages.find((lang) => lang.code === language) || languages[0]

View File

@@ -1,8 +1,8 @@
// Demo build (auto-init with demo LLM, for quick testing) // Demo build (auto-init with demo LLM, for quick testing)
export const CDN_DEMO_URL = export const CDN_DEMO_URL =
'https://cdn.jsdelivr.net/npm/page-agent@1.6.0/dist/iife/page-agent.demo.js' 'https://cdn.jsdelivr.net/npm/page-agent@1.7.0/dist/iife/page-agent.demo.js'
export const CDN_DEMO_CN_URL = export const CDN_DEMO_CN_URL =
'https://registry.npmmirror.com/page-agent/1.6.0/files/dist/iife/page-agent.demo.js' 'https://registry.npmmirror.com/page-agent/1.7.0/files/dist/iife/page-agent.demo.js'
// Demo LLM for website testing (homepage quick trial uses flash) // Demo LLM for website testing (homepage quick trial uses flash)
export const DEMO_MODEL = 'qwen3.5-flash' export const DEMO_MODEL = 'qwen3.5-flash'

View File

@@ -45,6 +45,7 @@ export default function DocsLayout({ children }: DocsLayoutProps) {
{ title: isZh ? '知识注入' : 'Instructions', path: '/features/custom-instructions' }, { title: isZh ? '知识注入' : 'Instructions', path: '/features/custom-instructions' },
{ title: isZh ? '数据脱敏' : 'Data Masking', path: '/features/data-masking' }, { title: isZh ? '数据脱敏' : 'Data Masking', path: '/features/data-masking' },
{ title: isZh ? 'Chrome 扩展' : 'Chrome Extension', path: '/features/chrome-extension' }, { title: isZh ? 'Chrome 扩展' : 'Chrome Extension', path: '/features/chrome-extension' },
{ title: 'MCP Server (Beta)', path: '/features/mcp-server' },
{ {
title: isZh ? '接入第三方 Agent' : 'Third-party Agent', title: isZh ? '接入第三方 Agent' : 'Third-party Agent',
path: '/features/third-party-agent', path: '/features/third-party-agent',

View File

@@ -100,7 +100,7 @@ console.log(result.history) // Full execution history`}
> >
AgentConfig AgentConfig
</Link>{' '} </Link>{' '}
{' '} PanelConfig {' '}
<Link <Link
href="/advanced/page-controller#configuration" href="/advanced/page-controller#configuration"
className="text-blue-600 dark:text-blue-400 hover:underline" className="text-blue-600 dark:text-blue-400 hover:underline"
@@ -125,7 +125,7 @@ console.log(result.history) // Full execution history`}
> >
AgentConfig AgentConfig
</Link>{' '} </Link>{' '}
and{' '} , PanelConfig, and{' '}
<Link <Link
href="/advanced/page-controller#configuration" href="/advanced/page-controller#configuration"
className="text-blue-600 dark:text-blue-400 hover:underline" className="text-blue-600 dark:text-blue-400 hover:underline"

View File

@@ -199,7 +199,9 @@ interface ExecuteConfig {
model: string // Model name model: string // Model name
apiKey?: string // LLM AK apiKey?: string // LLM AK
systemInstruction?: string // Global system-level instructions
includeInitialTab?: boolean includeInitialTab?: boolean
experimentalIncludeAllTabs?: boolean // Control all unpinned tabs in the window
onStatusChange?: (status: AgentStatus) => void onStatusChange?: (status: AgentStatus) => void
onActivity?: (activity: AgentActivity) => void onActivity?: (activity: AgentActivity) => void
onHistoryUpdate?: (history: HistoricalEvent[]) => void onHistoryUpdate?: (history: HistoricalEvent[]) => void
@@ -233,6 +235,7 @@ const result = await window.PAGE_AGENT_EXT.execute(
apiKey: 'your-api-key', apiKey: 'your-api-key',
model: 'gpt-5.2', model: 'gpt-5.2',
// includeInitialTab: false, // 设为 false 排除初始标签页 // includeInitialTab: false, // 设为 false 排除初始标签页
// experimentalIncludeAllTabs: true, // 控制窗口内所有非固定标签页
onStatusChange: status => console.log('状态变化:', status), onStatusChange: status => console.log('状态变化:', status),
onActivity: activity => console.log('活动:', activity), onActivity: activity => console.log('活动:', activity),
onHistoryUpdate: history => console.log('历史更新:', history) onHistoryUpdate: history => console.log('历史更新:', history)
@@ -248,6 +251,7 @@ const result = await window.PAGE_AGENT_EXT.execute(
apiKey: 'your-api-key', apiKey: 'your-api-key',
model: 'gpt-5.2', model: 'gpt-5.2',
// includeInitialTab: false, // Set to false to exclude initial tab // includeInitialTab: false, // Set to false to exclude initial tab
// experimentalIncludeAllTabs: true, // Control all unpinned tabs in the window
onStatusChange: status => console.log('Status change:', status), onStatusChange: status => console.log('Status change:', status),
onActivity: activity => console.log('Activity:', activity), onActivity: activity => console.log('Activity:', activity),
onHistoryUpdate: history => console.log('History update:', history) onHistoryUpdate: history => console.log('History update:', history)

View File

@@ -0,0 +1,70 @@
import BetaNotice from '@/components/BetaNotice'
import CodeEditor from '@/components/CodeEditor'
import { Heading } from '@/components/Heading'
export default function McpServerPage() {
return (
<div>
<h1 className="text-4xl font-bold mb-6">MCP Server (Beta)</h1>
<BetaNotice />
<p className="text-xl text-gray-600 dark:text-gray-300 mb-8 leading-relaxed">
Use the MCP server to let your local agent send natural-language browser tasks to Page Agent
Ext.
</p>
<section className="mb-10">
<Heading id="quick-start" className="text-2xl font-bold mb-4">
How to use
</Heading>
<div className="space-y-4">
<div className="p-4 bg-blue-50 dark:bg-blue-950/20 rounded-lg border border-blue-200 dark:border-blue-800">
<p className="text-sm text-blue-900 dark:text-blue-200 leading-7">
1. Install Page Agent Ext in Chrome.
<br />
2. Add the MCP server to your local agent client.
<br />
3. Start the client and approve the Hub connection in the browser when prompted.
<br />
4. Ask your agent to do something in the browser. The client will call execute_task
for you.
</p>
</div>
<CodeEditor
code={`{
"mcpServers": {
"page-agent": {
"command": "npx",
"args": ["-y", "@page-agent/mcp"],
"env": {
"LLM_BASE_URL": "https://api.openai.com/v1",
"LLM_API_KEY": "sk-xxx",
"LLM_MODEL_NAME": "gpt-5.2"
}
}
}
}`}
language="json"
/>
</div>
</section>
<section className="mb-10">
<Heading id="the-hub" className="text-2xl font-bold mb-4">
The Hub
</Heading>
<p className="text-gray-700 dark:text-gray-300 leading-relaxed">
The Hub is the control center for communication between Page Agent Ext and external
callers.
</p>
<p className="text-gray-700 dark:text-gray-300 leading-relaxed">
When the MCP server starts, it opens a local launcher page. The launcher asks the
extension to open the Hub tab, and the Hub receives tasks from your local agent. MCP uses
this path, but the Hub itself is the extension's general external communication entry
point.
</p>
</section>
</div>
)
}

View File

@@ -9,6 +9,7 @@ const BASELINE = new Set([
'claude-haiku-4.5', 'claude-haiku-4.5',
'gemini-3-flash', 'gemini-3-flash',
'deepseek-3.2', 'deepseek-3.2',
'qwen3.6-plus',
'qwen3.5-plus', 'qwen3.5-plus',
'qwen3.5-flash', 'qwen3.5-flash',
]) ])
@@ -16,6 +17,7 @@ const BASELINE = new Set([
// Models grouped by brand, newest first // Models grouped by brand, newest first
const MODEL_GROUPS: Record<string, string[]> = { const MODEL_GROUPS: Record<string, string[]> = {
Qwen: [ Qwen: [
'qwen3.6-plus',
'qwen3.5-plus', 'qwen3.5-plus',
'qwen3.5-flash', 'qwen3.5-flash',
'qwen3-coder-next', 'qwen3-coder-next',
@@ -33,8 +35,8 @@ const MODEL_GROUPS: Record<string, string[]> = {
'claude-haiku-4.5', 'claude-haiku-4.5',
'claude-sonnet-3.5', 'claude-sonnet-3.5',
], ],
xAI: ['grok-4.1-fast', 'grok-4', 'grok-code-fast'],
MiniMax: ['MiniMax-M2.7', 'MiniMax-M2.7-highspeed', 'MiniMax-M2.5', 'MiniMax-M2.5-highspeed'], MiniMax: ['MiniMax-M2.7', 'MiniMax-M2.7-highspeed', 'MiniMax-M2.5', 'MiniMax-M2.5-highspeed'],
xAI: ['grok-4.1-fast', 'grok-4', 'grok-code-fast'],
MoonshotAI: ['kimi-k2.5'], MoonshotAI: ['kimi-k2.5'],
'Z.AI': ['glm-5', 'glm-4.7'], 'Z.AI': ['glm-5', 'glm-4.7'],
} }
@@ -181,7 +183,7 @@ const pageAgent = new PageAgent({
</a> </a>
</p> </p>
<CodeEditor <CodeEditor
code={`# qwen3.5-plus (default for demos) or qwen3.5-flash (lighter) code={`# qwen3.5-plus / qwen3.5-flash
LLM_BASE_URL="https://page-ag-testing-ohftxirgbn.cn-shanghai.fcapp.run" LLM_BASE_URL="https://page-ag-testing-ohftxirgbn.cn-shanghai.fcapp.run"
LLM_MODEL_NAME="qwen3.5-plus" LLM_MODEL_NAME="qwen3.5-plus"
LLM_API_KEY="NA"`} LLM_API_KEY="NA"`}

View File

@@ -13,6 +13,7 @@ import ChromeExtension from './features/chrome-extension/page'
import Instructions from './features/custom-instructions/page' import Instructions from './features/custom-instructions/page'
import CustomTools from './features/custom-tools/page' import CustomTools from './features/custom-tools/page'
import DataMasking from './features/data-masking/page' import DataMasking from './features/data-masking/page'
import McpServerPage from './features/mcp-server/page'
import Models from './features/models/page' import Models from './features/models/page'
import ThirdPartyAgent from './features/third-party-agent/page' import ThirdPartyAgent from './features/third-party-agent/page'
import Limitations from './introduction/limitations/page' import Limitations from './introduction/limitations/page'
@@ -80,6 +81,11 @@ export default function DocsRouter() {
<ChromeExtension /> <ChromeExtension />
</DocsPage> </DocsPage>
</Route> </Route>
<Route path="/features/mcp-server">
<DocsPage>
<McpServerPage />
</DocsPage>
</Route>
<Route path="/features/third-party-agent"> <Route path="/features/third-party-agent">
<DocsPage> <DocsPage>
<ThirdPartyAgent /> <ThirdPartyAgent />

View File

@@ -58,6 +58,22 @@ export default function OneMoreThingSection() {
</Link> </Link>
</div> </div>
<div className="mb-10 rounded-2xl border border-blue-200/70 dark:border-blue-800/70 bg-linear-to-r from-blue-50 to-white dark:from-blue-950/30 dark:to-gray-900 px-5 py-4 max-w-3xl mx-auto text-left sm:text-center">
<p className="text-sm text-gray-700 dark:text-gray-300 leading-7">
{isZh
? '从 Claude Desktop、Copilot 或其他本地 Agent 直接发起浏览器任务?'
: 'Using Claude Desktop, Copilot, or another local agent? Connect it to the extension with the MCP server.'}
</p>
<p>
<Link
href="/docs/features/mcp-server"
className="font-medium text-blue-700 dark:text-blue-300 underline underline-offset-4"
>
{isZh ? '查看 MCP 文档' : 'Read the MCP docs'}
</Link>
</p>
</div>
<div className="grid sm:grid-cols-3 gap-5 text-left max-w-3xl mx-auto"> <div className="grid sm:grid-cols-3 gap-5 text-left max-w-3xl mx-auto">
{[ {[
{ {
@@ -67,16 +83,16 @@ export default function OneMoreThingSection() {
: 'Run tasks across multiple pages and tabs without being limited to a single page context', : 'Run tasks across multiple pages and tabs without being limited to a single page context',
}, },
{ {
title: isZh ? '页面发起控制' : 'Control from Your Page', title: isZh ? '页面发起控制' : 'Control from a WebPage',
desc: isZh desc: isZh
? '在页面 JS 中发起任务,驱动整个浏览器完成跨标签操作' ? '在页面 JS 中发起任务,驱动整个浏览器完成跨标签操作'
: 'Trigger tasks from page JS to drive the entire browser across tabs', : 'Trigger tasks from in-page JS to drive the entire browser across tabs',
}, },
{ {
title: isZh ? '外部发起任务' : 'External Triggers', title: isZh ? '外部发起任务' : 'External Caller',
desc: isZh desc: isZh
? '页面 JS、本地 Agent 或云端 Agent 均可通过扩展发起任务' ? '页面 JS、本地 Agent 或云端 Agent 均可通过扩展发起任务'
: 'Page JS, local agents, or cloud agents can trigger tasks through the extension', : 'Local agents and cloud agents can control user browser through the extension',
}, },
].map((item) => ( ].map((item) => (
<MagicCard <MagicCard

View File

@@ -27,6 +27,7 @@ const SPA_ROUTES = [
'docs/features/custom-instructions', 'docs/features/custom-instructions',
'docs/features/models', 'docs/features/models',
'docs/features/chrome-extension', 'docs/features/chrome-extension',
'docs/features/mcp-server',
'docs/features/third-party-agent', 'docs/features/third-party-agent',
'docs/advanced/page-agent', 'docs/advanced/page-agent',
'docs/advanced/page-agent-core', 'docs/advanced/page-agent-core',