diff --git a/AGENTS.md b/AGENTS.md index 46fb477..5252909 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -22,6 +22,7 @@ npm start # Start website dev server npm run build # Build all packages npm run build:libs # Build all libraries npm run lint # ESLint with TypeScript strict rules +npm run zip -w @page-agent/ext # Zip the extension package ``` ## Architecture @@ -36,7 +37,7 @@ packages/ ├── page-agent/ # npm: "page-agent" entry class (with UI + controller + demo builds) ├── website/ # @page-agent/website (private) ├── llms/ # @page-agent/llms -├── extension/ # 🚧 WIP: Browser extension (WXT + React) +├── extension/ # Browser extension (WXT + React) ├── page-controller/ # @page-agent/page-controller └── ui/ # @page-agent/ui ``` diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 2c75a38..541f424 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -20,10 +20,11 @@ Thank you for your interest in contributing to Page-Agent! We welcome contributi ### Project Structure -This is a **monorepo** with npm workspaces containing **3 main packages**: +This is a **monorepo** with npm workspaces containing **4 main packages**: - **Page Agent** (`packages/page-agent/`) - Main entry with built-in UI Panel, published as `page-agent` on npm - **Core** (`packages/core/`) - Core agent logic without UI (npm: `@page-agent/core`) +- **Extension** (`packages/extension/`) - Chrome extension for multi-page tasks and browser-level automation - **Website** (`packages/website/`) - React documentation and landing page. Also as demo and test page for the core lib. private package `@page-agent/website` We use a simplified monorepo solution with `native npm-workspace + ts reference + vite alias`. No fancy tooling. Hoisting is required. @@ -145,6 +146,16 @@ If your lame AI assistant does not support [AGENTS.md](https://agents.md/). Add npm start ``` +### Extension Development + +```bash +npm run dev -w @page-agent/ext +npm run zip -w @page-agent/ext +``` + +- Load extension in Chrome via `chrome://extensions` -> **Load unpacked** +- Use `packages/extension/docs/extension_api.md` (EN) or `packages/extension/docs/extension_api_zh.md` (ZH) for API integration details + ### Testing on Other Websites - Start and serve a local `iife` script @@ -193,14 +204,6 @@ By contributing to this project, you agree that your contributions will be licen > You may need to sign a github CLA before you create a PR. -### Browser-Use Attribution - -Parts of this project are derived from the [browser-use](https://github.com/browser-use/browser-use) project (MIT License). When contributing to DOM-related functionality: - -- Maintain existing attribution comments -- Follow similar patterns for consistency -- Credit browser-use for derived concepts - ## 💬 Questions? - Open a GitHub issue for technical questions diff --git a/README-zh.md b/README-zh.md index c8ea417..e7328aa 100644 --- a/README-zh.md +++ b/README-zh.md @@ -1,4 +1,4 @@ -# PageAgent 🤖🪄 +# Page Agent @@ -19,18 +19,16 @@ ## ✨ Features -- **🎯 轻松集成** - - 无需 Python,无需无头浏览器,无需浏览器插件。纯页面内脚本。 -- **🔐 端侧运行** -- **🧠 HTML 脱水** -- **💬 自然语言接口** -- **🎨 HITL 交互界面** - -以及 😉 - -- **🧪 实验性的 Chrome 扩展,支持跨页面控制** - `packages/extension` - -👉 [**🗺️ Roadmap**](https://github.com/alibaba/page-agent/issues/96) +- **🎯 轻松集成** + - 无需 `浏览器插件` / `Python` / `无头浏览器`。 + - 纯页面内 JavaScript,一切都在你的网页中完成。 + - The best tool for your agent to control web pages. +- **📖 基于文本的 DOM 操作** + - 无需截图,无需 OCR 或多模态模型。 + - 无需特殊权限。 +- **🧠 用你自己的 LLM** +- **🎨 精美 UI,支持人机协同** +- **🐙 可选的 [Chrome 扩展](https://alibaba.github.io/page-agent/#/docs/features/chrome-extension),支持跨页面任务。** ## 🚀 快速开始 @@ -39,20 +37,16 @@ 通过我们免费的 Demo LLM 快速体验 PageAgent: ```html - + ``` -> - **⚠️ 仅用于技术评估。** Demo LLM 有速率和使用限制,可能随时变更。 -> - **🌷 建议使用自己的 LLM API。** - | Mirrors | URL | | ------- | ---------------------------------------------------------------------------------- | | Global | https://cdn.jsdelivr.net/npm/page-agent@1.2.0/dist/iife/page-agent.demo.js | | China | https://registry.npmmirror.com/page-agent/1.2.0/files/dist/iife/page-agent.demo.js | +> **⚠️ 仅用于技术评估。** Demo LLM 有速率和使用限制,速度较慢,可能随时变更。 + ### NPM 安装 ```bash @@ -72,7 +66,7 @@ const agent = new PageAgent({ await agent.execute('点击登录按钮') ``` -适用于无法使用 NPM 的环境,我们也提供了 IIFE 构建的 CDN 方式。[@see CDN Usage](https://alibaba.github.io/page-agent/#/docs/integration/cdn-setup) +更多编程用法,请参阅 [📖 文档](https://alibaba.github.io/page-agent/#/docs/introduction/overview)。 ## 🏗️ 架构设计 @@ -80,12 +74,13 @@ PageAgent adopts a simplified monorepo structure: ``` packages/ -├── core/ # ** Core agent logic without UI(npm: @page-agent/core) ** -├── page-agent/ # Exported agent and demo(npm: page-agent) +├── core/ # ** Core agent logic (npm: @page-agent/core) ** ├── llms/ # LLM 客户端 (npm: @page-agent/llms) -├── page-controller/ # DOM 操作 & 蒙层 & 模拟鼠标 (npm: @page-agent/page-controller) -├── ui/ # 面板 & i18n (npm: @page-agent/ui) -└── website/ # 文档站点 +├── page-controller/ # DOM 操作 (npm: @page-agent/page-controller) +├── ui/ # 面板 UI (npm: @page-agent/ui) +├── page-agent/ # 入口类 & iife 包 (npm: page-agent) +├── extension/ # Chrome 扩展,支持跨页面任务 +└── website/ # 网站 & 文档站点 ``` ## 🤝 贡献 diff --git a/README.md b/README.md index fc1eaaa..f2a509e 100644 --- a/README.md +++ b/README.md @@ -1,4 +1,4 @@ -# PageAgent 🤖🪄 +# Page Agent @@ -19,18 +19,16 @@ The GUI Agent Living in Your Webpage. Control web interfaces with natural langua ## ✨ Features -- **🎯 Easy Integration** - - No python. No headless browser. No browser extension. Just in-page scripts. -- **🔐 Client-Side Processing** -- **🧠 DOM Extraction** -- **💬 Natural Language Interface** -- **🎨 UI with Human in the loop** - -And 😉 - -- **🧪 `cross-page` control with an experimental chrome extension** - `packages/extension` - -👉 [**🗺️ Roadmap**](https://github.com/alibaba/page-agent/issues/96) +- **🎯 Easy integration** + - No need for `browser extension` / `python` / `headless browser`. + - Just in-page javascript. Everything happens in your web page. + - The best tool for your agent to control web pages. +- **📖 Text-based DOM manipulation** + - No screenshots. No OCR or multi-modal LLMs needed. + - No special permissions required. +- **🧠 Bring your own LLMs** +- **🎨 Pretty UI with human-in-the-loop** +- **🐙 Optional [chrome extension](https://alibaba.github.io/page-agent/#/docs/features/chrome-extension) for multi-page tasks.** ## 🚀 Quick Start @@ -39,20 +37,16 @@ And 😉 Fastest way to try PageAgent with our free Demo LLM: ```html - + ``` -> - **⚠️ For technical evaluation only.** Demo LLM has rate limits and usage restrictions. May change without notice. -> - **🌷 Bring your own LLM API.** - | Mirrors | URL | | ------- | ---------------------------------------------------------------------------------- | | Global | https://cdn.jsdelivr.net/npm/page-agent@1.2.0/dist/iife/page-agent.demo.js | | China | https://registry.npmmirror.com/page-agent/1.2.0/files/dist/iife/page-agent.demo.js | +> **⚠️ For technical evaluation only.** Demo LLM has rate limits and usage restrictions. Slow. May change without notice. + ### NPM Installation ```bash @@ -72,18 +66,21 @@ const agent = new PageAgent({ await agent.execute('Click the login button') ``` +For more programmatic usage, see [📖 Documentations](https://alibaba.github.io/page-agent/#/docs/introduction/overview). + ## 🏗️ Structure PageAgent adopts a simplified monorepo structure: ``` packages/ -├── core/ # ** Core agent logic without UI(npm: @page-agent/core) ** -├── page-agent/ # Exported agent and demo(npm: page-agent) +├── core/ # ** Core agent logic (npm: @page-agent/core) ** ├── llms/ # LLM client (npm: @page-agent/llms) -├── page-controller/ # DOM operations & Visual Mask (npm: @page-agent/page-controller) -├── ui/ # Panel & i18n (npm: @page-agent/ui) -└── website/ # Demo & Documentation site +├── page-controller/ # DOM operations (npm: @page-agent/page-controller) +├── ui/ # Panel UI (npm: @page-agent/ui) +├── page-agent/ # Entry class and iife builds(npm: page-agent) +├── extension/ # Chrome extension for multi-page tasks +└── website/ # Website & Documentation site ``` ## 🤝 Contributing diff --git a/packages/extension/docs/extension_api.md b/packages/extension/docs/extension_api.md index 2a0bbda..32a52df 100644 --- a/packages/extension/docs/extension_api.md +++ b/packages/extension/docs/extension_api.md @@ -1,12 +1,18 @@ # Page Agent Extension API -This document describes how to integrate the Page Agent browser extension into your web application. +Integrate the Page Agent extension into your web app and trigger multi-page browser tasks from page JavaScript. ## Installation ### 1. Install the browser extension -Install the Page Agent extension from the Chrome Web Store. +Primary channel: + +- Chrome Web Store: https://chromewebstore.google.com/detail/page-agent-ext/akldabonmimlicnjlflnapfeklbfemhj + +Latest updates are often published earlier on: + +- GitHub Releases: https://github.com/alibaba/page-agent/releases ### 2. Install type definitions (recommended) @@ -14,11 +20,19 @@ Install the Page Agent extension from the Chrome Web Store. npm install @page-agent/core --save-dev ``` -### 3. Set up authentication +### 3. Authorization (Token) -The extension only injects APIs when it detects a valid token in `localStorage`. +The token allows your page JS to call the extension API (`window.PAGE_AGENT_EXT`) and execute multi-page tasks. -1. Open the extension's side panel to get your authorization token +Why token-based access is required: + +- The extension has broad browser permissions (page access, navigation, multi-tab control). +- If abused, it can harm user privacy and security. +- Users must explicitly provide the token only to applications they trust. + +Setup: + +1. Open the extension side panel and copy your auth token. 2. Set the token in your page: ```typescript @@ -60,36 +74,36 @@ if (await waitForExtension()) { ## Global API -The extension injects the following APIs into the `window` object: +After token match, the extension injects APIs into `window`. ### `window.PAGE_AGENT_EXT_VERSION` -Extension version string (e.g., `"1.0.0"`). This is exposed separately to allow version checking before accessing the main API object. +Extension version string (for capability checks before using the main API). ### `window.PAGE_AGENT_EXT` -Main API namespace object containing: +Main namespace object. #### `PAGE_AGENT_EXT.execute(task, config)` -Execute an agent task. +Execute one agent task. -**Parameters:** +Parameters: | Name | Type | Required | Description | -|------|------|----------|-------------| +| ---- | ---- | -------- | ----------- | | `task` | `string` | Yes | Task description | -| `config` | `ExecuteConfig` | Yes | Execution configuration (LLM settings, options, and event callbacks) | +| `config` | `ExecuteConfig` | Yes | LLM settings, options, and callbacks | -**Returns:** `Promise` +Returns: `Promise` #### `PAGE_AGENT_EXT.dispose()` -Stop and destroy the current running agent. +Stop the current task. ## Types -Install `@page-agent/core` for full type definitions: +Install `@page-agent/core` for complete types: ```typescript import type { @@ -104,10 +118,7 @@ export interface ExecuteConfig { apiKey: string model: string - /** - * Whether to include the initial tab (that holds this main world script) in the task. - * @default true - */ + // Include the initial tab where page JS starts. Default: true. includeInitialTab?: boolean onStatusChange?: (status: AgentStatus) => void @@ -119,20 +130,13 @@ export interface ExecuteConfig { export type Execute = (task: string, config: ExecuteConfig) => Promise ``` -### AgentStatus +`AgentStatus` ```typescript type AgentStatus = 'idle' | 'running' | 'completed' | 'error' ``` -| Status | Description | -|--------|-------------| -| `idle` | Agent is idle, ready to execute | -| `running` | Agent is executing a task | -| `completed` | Task completed successfully | -| `error` | Task failed with an error | - -### AgentActivity +`AgentActivity` ```typescript type AgentActivity = @@ -143,15 +147,7 @@ type AgentActivity = | { type: 'error'; message: string } ``` -| Type | Description | -|------|-------------| -| `thinking` | Agent is analyzing the page and planning | -| `executing` | Agent is executing a tool action | -| `executed` | Tool execution completed | -| `retrying` | Retrying after a failure | -| `error` | An error occurred | - -### HistoricalEvent +`HistoricalEvent` ```typescript type HistoricalEvent = @@ -162,7 +158,7 @@ type HistoricalEvent = | { type: 'error'; message: string; rawResponse?: unknown } ``` -### ExecutionResult +`ExecutionResult` ```typescript interface ExecutionResult { @@ -183,81 +179,22 @@ const result = await window.PAGE_AGENT_EXT!.execute( baseURL: 'https://api.openai.com/v1', apiKey: process.env.OPENAI_API_KEY!, model: 'gpt-5.2', - } -) - -if (result.success) { - console.log('Task completed:', result.data) -} else { - console.error('Task failed') -} -``` - -### Exclude Initial Tab - -By default, the agent includes the initial tab (where the script runs) in the task. Set `includeInitialTab: false` to exclude it: - -```typescript -const result = await window.PAGE_AGENT_EXT!.execute( - 'Open a new tab and search for page-agent on GitHub', - { - baseURL: 'https://api.openai.com/v1', - apiKey: process.env.OPENAI_API_KEY!, - model: 'gpt-5.2', - includeInitialTab: false, // Agent will open new tabs only + includeInitialTab: false, // Optional: exclude current tab + onStatusChange: (status) => console.log(status), + onActivity: (activity) => console.log(activity), } ) ``` -### With Event Callbacks +### Stop the Current Task ```typescript -await window.PAGE_AGENT_EXT!.execute('Navigate to the settings page', { - baseURL: 'https://api.openai.com/v1', - apiKey: process.env.OPENAI_API_KEY!, - model: 'gpt-5.2', - onStatusChange: (status) => { - updateUI({ agentStatus: status }) - }, - onActivity: (activity) => { - switch (activity.type) { - case 'thinking': - showSpinner('Agent is thinking...') - break - case 'executing': - showSpinner(`Executing: ${activity.tool}`) - break - case 'executed': - log(`${activity.tool} completed in ${activity.duration}ms`) - break - case 'error': - showError(activity.message) - break - } - }, - onHistoryUpdate: (history) => { - renderHistory(history) - }, -}) -``` - -### Stop Execution - -```typescript -// Start a task -window.PAGE_AGENT_EXT!.execute('Scroll through all pages', { - baseURL: 'https://api.openai.com/v1', - apiKey: process.env.OPENAI_API_KEY!, - model: 'gpt-5.2', -}) - -// Later, stop it window.PAGE_AGENT_EXT!.dispose() ``` ## Window Type Declaration -If not using `@page-agent/core`, add this to your project: +If you are not importing `@page-agent/core`, add: ```typescript import type { @@ -283,7 +220,7 @@ declare global { PAGE_AGENT_EXT_VERSION?: string PAGE_AGENT_EXT?: { version: string - execute: (task: string, config: ExecuteConfig) => Promise + execute: Execute dispose: () => void } } diff --git a/packages/extension/docs/extension_api_zh.md b/packages/extension/docs/extension_api_zh.md index 7cc64fd..6814aa2 100644 --- a/packages/extension/docs/extension_api_zh.md +++ b/packages/extension/docs/extension_api_zh.md @@ -1,12 +1,18 @@ # Page Agent 浏览器插件 API -本文档介绍如何在网页应用中接入 Page Agent 浏览器插件。 +在你的网页应用中接入 Page Agent 插件,并通过页面 JavaScript 发起多页面浏览器任务。 ## 安装 ### 1. 安装浏览器插件 -从 Chrome 应用商店安装 Page Agent 插件。 +首选渠道: + +- Chrome 应用商店:https://chromewebstore.google.com/detail/page-agent-ext/akldabonmimlicnjlflnapfeklbfemhj + +通常更快提供最新构建的渠道: + +- GitHub Releases:https://github.com/alibaba/page-agent/releases ### 2. 安装类型定义(推荐) @@ -14,11 +20,19 @@ npm install @page-agent/core --save-dev ``` -### 3. 配置认证 +### 3. 授权(Token) -插件在页面加载后检测 `localStorage` 中的 token,匹配时才会注入 API。 +token 用于让页面 JS 调用扩展 API(`window.PAGE_AGENT_EXT`)并执行多页面任务。 -1. 打开插件的侧边栏面板,获取授权 token +为什么必须使用 token: + +- 插件具备较广的浏览器权限(页面访问、导航、多标签控制)。 +- 若被滥用,可能危害用户隐私与安全。 +- 用户必须主动将 token 提供给其信任的应用。 + +配置步骤: + +1. 在扩展侧边栏中复制 auth token。 2. 在页面中设置 token: ```typescript @@ -60,32 +74,32 @@ if (await waitForExtension()) { ## 全局 API -插件在 `window` 对象上注入以下 API: +token 匹配后,插件会在 `window` 上注入 API。 ### `window.PAGE_AGENT_EXT_VERSION` -插件版本号字符串(例如 `"1.0.0"`)。单独暴露版本号,方便在访问主 API 对象前进行版本检查。 +插件版本号字符串,可用于在访问主 API 前做能力检查。 ### `window.PAGE_AGENT_EXT` -主 API 命名空间对象,包含: +主命名空间对象。 #### `PAGE_AGENT_EXT.execute(task, config)` 执行 Agent 任务。 -**参数:** +参数: | 名称 | 类型 | 必填 | 说明 | -|------|------|------|------| +| ---- | ---- | ---- | ---- | | `task` | `string` | 是 | 任务描述 | -| `config` | `ExecuteConfig` | 是 | 执行配置(LLM 设置、选项和事件回调) | +| `config` | `ExecuteConfig` | 是 | LLM 设置、执行选项和回调 | -**返回:** `Promise` +返回:`Promise` #### `PAGE_AGENT_EXT.dispose()` -停止并销毁当前运行的 Agent。 +停止当前任务。 ## 类型定义 @@ -104,10 +118,7 @@ export interface ExecuteConfig { apiKey: string model: string - /** - * 是否将初始标签页(运行此脚本的页面)包含在任务中。 - * @default true - */ + // 是否包含启动脚本所在标签页。默认 true。 includeInitialTab?: boolean onStatusChange?: (status: AgentStatus) => void @@ -119,20 +130,13 @@ export interface ExecuteConfig { export type Execute = (task: string, config: ExecuteConfig) => Promise ``` -### AgentStatus +`AgentStatus` ```typescript type AgentStatus = 'idle' | 'running' | 'completed' | 'error' ``` -| 状态 | 说明 | -|------|------| -| `idle` | 空闲,准备执行 | -| `running` | 正在执行任务 | -| `completed` | 任务成功完成 | -| `error` | 任务执行失败 | - -### AgentActivity +`AgentActivity` ```typescript type AgentActivity = @@ -143,15 +147,7 @@ type AgentActivity = | { type: 'error'; message: string } ``` -| 类型 | 说明 | -|------|------| -| `thinking` | Agent 正在分析页面并规划 | -| `executing` | 正在执行工具操作 | -| `executed` | 工具执行完成 | -| `retrying` | 失败后重试 | -| `error` | 发生错误 | - -### HistoricalEvent +`HistoricalEvent` ```typescript type HistoricalEvent = @@ -162,7 +158,7 @@ type HistoricalEvent = | { type: 'error'; message: string; rawResponse?: unknown } ``` -### ExecutionResult +`ExecutionResult` ```typescript interface ExecutionResult { @@ -183,81 +179,22 @@ const result = await window.PAGE_AGENT_EXT!.execute( baseURL: 'https://api.openai.com/v1', apiKey: process.env.OPENAI_API_KEY!, model: 'gpt-5.2', - } -) - -if (result.success) { - console.log('任务完成:', result.data) -} else { - console.error('任务失败') -} -``` - -### 排除初始标签页 - -默认情况下,Agent 会将初始标签页(运行脚本的页面)包含在任务中。设置 `includeInitialTab: false` 可以排除它: - -```typescript -const result = await window.PAGE_AGENT_EXT!.execute( - '打开新标签页并在 GitHub 上搜索 page-agent', - { - baseURL: 'https://api.openai.com/v1', - apiKey: process.env.OPENAI_API_KEY!, - model: 'gpt-5.2', - includeInitialTab: false, // Agent 只会打开新标签页 + includeInitialTab: false, // 可选:排除当前标签页 + onStatusChange: (status) => console.log(status), + onActivity: (activity) => console.log(activity), } ) ``` -### 使用事件回调 +### 停止当前任务 ```typescript -await window.PAGE_AGENT_EXT!.execute('导航到设置页面', { - baseURL: 'https://api.openai.com/v1', - apiKey: process.env.OPENAI_API_KEY!, - model: 'gpt-5.2', - onStatusChange: (status) => { - updateUI({ agentStatus: status }) - }, - onActivity: (activity) => { - switch (activity.type) { - case 'thinking': - showSpinner('Agent 正在思考...') - break - case 'executing': - showSpinner(`正在执行: ${activity.tool}`) - break - case 'executed': - log(`${activity.tool} 完成,耗时 ${activity.duration}ms`) - break - case 'error': - showError(activity.message) - break - } - }, - onHistoryUpdate: (history) => { - renderHistory(history) - }, -}) -``` - -### 停止执行 - -```typescript -// 启动任务 -window.PAGE_AGENT_EXT!.execute('滚动浏览所有页面', { - baseURL: 'https://api.openai.com/v1', - apiKey: process.env.OPENAI_API_KEY!, - model: 'gpt-5.2', -}) - -// 稍后停止 window.PAGE_AGENT_EXT!.dispose() ``` ## Window 类型声明 -如果不使用 `@page-agent/core`,可以添加以下声明: +如果你不直接引入 `@page-agent/core`,可添加以下声明: ```typescript import type { @@ -283,7 +220,7 @@ declare global { PAGE_AGENT_EXT_VERSION?: string PAGE_AGENT_EXT?: { version: string - execute: (task: string, config: ExecuteConfig) => Promise + execute: Execute dispose: () => void } } diff --git a/packages/website/src/pages/docs/features/chrome-extension/page.tsx b/packages/website/src/pages/docs/features/chrome-extension/page.tsx index 6266b0f..f4db211 100644 --- a/packages/website/src/pages/docs/features/chrome-extension/page.tsx +++ b/packages/website/src/pages/docs/features/chrome-extension/page.tsx @@ -1,11 +1,13 @@ -import { siGithub } from 'simple-icons' +import { siChromewebstore, siGithub } from 'simple-icons' -import BetaNotice from '@/components/BetaNotice' import CodeEditor from '@/components/CodeEditor' import { useLanguage } from '@/i18n/context' export default function ChromeExtension() { const { isZh } = useLanguage() + const chromeWebStoreUrl = + 'https://chromewebstore.google.com/detail/page-agent-ext/akldabonmimlicnjlflnapfeklbfemhj' + const githubReleasesUrl = 'https://github.com/alibaba/page-agent/releases' return (
@@ -13,70 +15,92 @@ export default function ChromeExtension() {

{isZh - ? '可选的 Chrome 扩展,解锁多页任务和第三方 API 集成。' - : 'Optional Chrome extension that unlocks multi-page tasks and third-party API integration.'} + ? '可选的 Chrome 扩展。PageAgent.js 继续负责页面内自动化;扩展 API 额外提供多页面任务、浏览器级控制,以及从浏览器外部发起任务的能力。' + : 'An optional Chrome extension. PageAgent.js keeps handling in-page automation, while the extension API adds multi-page tasks, browser-level control, and tasks initiated from outside the browser.'}

- -
- {/* Hero Section */} -
-
-
-

- {isZh - ? '解锁多页任务!借助 Chrome 扩展,Agent 可以跨标签页和页面导航,突破单页限制。' - : 'Unlock multi-page tasks! With the Chrome extension, your agent can navigate across tabs and pages, breaking the single-page limitation.'} -

-
-
-
- {/* Features */}

{isZh ? '核心特性' : 'Key Features'}

-
+

🔓 {isZh ? '多页任务' : 'Multi-Page Tasks'}

{isZh - ? '跨多个页面和标签页执行任务,不再局限于单页操作。' - : 'Execute tasks across multiple pages and tabs. No longer limited to single-page operations.'} + ? '跨多个页面和标签页连续执行任务,不再受限于单页上下文。' + : 'Run tasks across multiple pages and tabs without being limited to a single page context.'}

- 🔌 {isZh ? '开放第三方接口' : 'Third-Party API'} + 🧭 {isZh ? '浏览器级控制' : 'Browser-Level Control'}

{isZh - ? '用户授权后,你的网页、本地 Agent 或云端 Agent 都能通过扩展操作用户浏览器!' - : 'After user authorization, your webpage, local agent, or cloud agent can control the browser through the extension.'} + ? '支持跨标签导航、页面切换和更完整的浏览器自动化能力。' + : 'Enable richer browser automation, including cross-tab navigation and page switching.'} +

+
+
+

+ 🔌 {isZh ? '开放集成接口' : 'Open Integration API'} +

+

+ {isZh + ? '用户主动授权后,页面 JS、本地 Agent 或云端 Agent 可通过扩展发起多页面任务。' + : 'With explicit user authorization, page JS, local agents, or cloud agents can trigger multi-page tasks through the extension.'}

- {/* Download */} + {/* Install */}
-

{isZh ? '下载测试版' : 'Download Beta'}

-

- {isZh - ? '扩展目前处于 Beta 阶段,请从 GitHub Releases 下载最新版本。' - : 'The extension is currently in beta. Download the latest version from GitHub Releases.'} -

- - - - - {isZh ? '前往 GitHub Releases 下载' : 'Download from GitHub Releases'} - +

{isZh ? '获取扩展' : 'Get the Extension'}

+ +
+ + {/* Relationship with PageAgent.js */} +
+

+ {isZh ? '与 PageAgent.js 的关系' : 'How It Relates to PageAgent.js'} +

+
+

+ {isZh + ? 'PageAgent.js 本身即可在页面内完成自动化。Chrome 扩展是可选的能力扩展。' + : 'PageAgent.js already works for in-page automation. The Chrome extension is optional, not a dependency.'} +

+

+ {isZh + ? '通过扩展,你可以执行多页面任务、控制浏览器,以及从浏览器外部(本地服务或云端服务)发起任务。' + : 'With the extension, you can perform multi-page tasks, browser-level control, and tasks triggered outside the browser (local or cloud services).'} +

+
{/* Third-party Integration */} @@ -86,32 +110,33 @@ export default function ChromeExtension() {

{isZh - ? '用户授权后,外部应用可以调用扩展 API 来控制浏览器。' - : 'After user authorization, external applications can call the extension API to control the browser.'} + ? '通过页面 JavaScript 调用 `window.PAGE_AGENT_EXT`,你的应用可以发起跨页面任务并控制浏览器行为。' + : 'By calling `window.PAGE_AGENT_EXT` from page JavaScript, your app can trigger multi-page tasks and control browser behavior.'}

- {/* Auth Flow */} -

{isZh ? '授权流程' : 'Authorization Flow'}

+

+ {isZh ? '授权与安全' : 'Authorization and Security'} +

{isZh - ? '扩展使用基于 Token 的授权机制,扩展端和页面端必须持有匹配的 Token。' - : 'The extension uses a token-based authorization mechanism. Both extension and page must have matching tokens.'} + ? '扩展权限范围较广(例如页面访问、导航、多标签控制)。若被滥用,可能危害用户隐私。为此,调用能力由 Token 保护,用户必须主动将 Token 提供给其信任的应用。' + : 'The extension has broad permissions (such as page access, navigation, and multi-tab control). If abused, it can harm user privacy. That is why access is protected by a token, and users must actively share the token only with applications they trust.'}

')` - : `// 1. User installs extension and sets an auth token in extension settings -// 2. Your page reads the same token and stores it in localStorage -// 3. After token match, extension exposes window.PAGE_AGENT_EXT object + : `// 1) Get auth token from the extension side panel +// 2) Set it only in trusted applications +// 3) After token match, extension exposes window.PAGE_AGENT_EXT -// ⚠️ Check your extension popup for the auth token +// ⚠️ Never provide the token to untrusted pages or scripts localStorage.setItem('PageAgentExtUserAuthToken', '')` } language="javascript" @@ -152,13 +177,87 @@ localStorage.setItem('PageAgentExtUserAuthToken', '')
-

PAGE_AGENT_EXT.execute(task, config)

+ {/* TypeScript Declaration */} +

+ {isZh ? 'TypeScript 类型声明' : 'TypeScript Declaration'} +

{isZh - ? '使用配置执行任务。返回一个 Promise,在任务完成时 resolve。config 参数包含 LLM 设置、选项和事件回调。' - : 'Execute a task with configuration. Returns a Promise that resolves when the task completes. Config includes LLM settings, options, and event callbacks.'} + ? '推荐把 `execute` 的类型声明加入你的项目,获得完整类型提示。' + : 'Add this `execute` declaration to your project for full type support.'}

+ void + onActivity?: (activity: AgentActivity) => void + onHistoryUpdate?: (history: HistoricalEvent[]) => void + onDispose?: () => void +} + +type Execute = (task: string, config: ExecuteConfig) => Promise + +declare global { + interface Window { + PAGE_AGENT_EXT_VERSION?: string + PAGE_AGENT_EXT?: { + version: string + execute: Execute + dispose: () => void + } + } +}` + : `import type { + AgentActivity, + AgentStatus, + ExecutionResult, + HistoricalEvent +} from '@page-agent/core' + +interface ExecuteConfig { + baseURL: string // LLM API endpoint + apiKey: string // API key + model: string // Model name + + includeInitialTab?: boolean + onStatusChange?: (status: AgentStatus) => void + onActivity?: (activity: AgentActivity) => void + onHistoryUpdate?: (history: HistoricalEvent[]) => void + onDispose?: () => void +} + +type Execute = (task: string, config: ExecuteConfig) => Promise + +declare global { + interface Window { + PAGE_AGENT_EXT_VERSION?: string + PAGE_AGENT_EXT?: { + version: string + execute: Execute + dispose: () => void + } + } +}` + } + language="typescript" + /> + +

PAGE_AGENT_EXT.execute(task, config)

+ console.log('状态变化:', status), onActivity: activity => console.log('活动:', activity), @@ -184,7 +283,7 @@ const result = await window.PAGE_AGENT_EXT.execute( { baseURL: 'https://api.openai.com/v1', apiKey: 'your-api-key', - model: 'gpt-5-2', + model: 'gpt-5.2', // includeInitialTab: false, // Set to false to exclude initial tab onStatusChange: status => console.log('Status change:', status), onActivity: activity => console.log('Activity:', activity), @@ -217,111 +316,29 @@ window.PAGE_AGENT_EXT.dispose()` /> - {/* ExecuteConfig */} -
-

{isZh ? '执行配置' : 'Execute Configuration'}

-

- {isZh - ? 'config 参数包含 LLM 设置、选项和事件回调,用于控制任务执行行为。' - : 'The config parameter includes LLM settings, options, and event callbacks to control task execution behavior.'} -

- - void - - // Agent 执行活动时调用(如点击、输入、导航等操作) - onActivity?: (activity: AgentActivity) => void - - // 历史记录更新时调用(包含完整的事件历史) - onHistoryUpdate?: (history: HistoricalEvent[]) => void - - // Agent 被停止时调用 - onDispose?: () => void -}` - : `interface ExecuteConfig { - baseURL: string // LLM API endpoint - apiKey: string // API key - model: string // Model name - - // Whether to include the initial tab in the task, default true - includeInitialTab?: boolean - - // Called when agent status changes (idle, running, error, completed, etc.) - onStatusChange?: (status: AgentStatus) => void - - // Called when agent performs an activity (click, input, navigation, etc.) - onActivity?: (activity: AgentActivity) => void - - // Called when history is updated (contains full event history) - onHistoryUpdate?: (history: HistoricalEvent[]) => void - - // Called when agent is disposed - onDispose?: () => void -}` - } - language="typescript" - /> -
- - {/* Security Notice */} -
-

- ⚠️ {isZh ? '安全须知' : 'Security Notes'} -

-
    -
  • - •{' '} - {isZh - ? '用户必须在扩展设置中显式授权每个域名' - : 'Users must explicitly authorize each domain in extension settings'} -
  • -
  • - •{' '} - {isZh - ? '生产环境建议使用后端代理 LLM API Key' - : 'Consider using backend proxy for LLM API keys in production'} -
  • -
-
- {/* Integration Guide */}

{isZh - ? '将 MultiPageAgent 融入你自己的插件' + ? '将 MultiPageAgent 集成你自己的插件' : 'Integrate MultiPageAgent into Your Extension'}

{isZh - ? '你可以将 MultiPageAgent 集成到自己的浏览器扩展中,实现跨页面的 AI 自动化能力。' - : 'You can integrate MultiPageAgent into your own browser extension for cross-page AI automation capabilities.'} + ? '建议先阅读扩展 API 文档,再参考 background entry implementation。' + : 'Start with the extension API docs, then use the background entry implementation as a reference.'} + + + + + packages/extension/src/entrypoints/background.ts +

-

TODO

-

- {isZh ? '参考源码实现:' : 'Reference implementation:'} -

- - - - - packages/extension/src/entrypoints/background.ts -
diff --git a/tsconfig.base.json b/tsconfig.base.json index a559cdc..a31f78c 100644 --- a/tsconfig.base.json +++ b/tsconfig.base.json @@ -28,20 +28,5 @@ "erasableSyntaxOnly": true, "noFallthroughCasesInSwitch": true, "noUncheckedSideEffectImports": true - - // "paths": { - // // Simplified monorepo solution (raw npm workspace with hoisting) - // "@page-agent/page-controller": ["./packages/page-controller/src/PageController.ts"], - // "page-agent": ["./packages/page-agent/src/PageAgent.ts"] - // } } - // "references": [ - // { "path": "./packages/page-controller" }, - // { "path": "./packages/page-agent" }, - // { "path": "./packages/website" } - // ], - // "include": ["packages/*/src/**/*.ts", "packages/*/src/**/*.tsx"], - // "exclude": ["node_modules", "dist", "packages/*/dist"] - // "files": ["env.d.ts"] - // "files": [] } diff --git a/tsconfig.json b/tsconfig.json index 5b710ed..f3616a9 100644 --- a/tsconfig.json +++ b/tsconfig.json @@ -1,3 +1,5 @@ +// this is only for IDE ts language server to work. +// do not use this for building or linting. { "extends": "./tsconfig.base.json", "references": [