docs: update extension related docs

This commit is contained in:
Simon
2026-02-12 17:19:14 +08:00
parent 11d66f42c4
commit f19b3cc2cc
9 changed files with 312 additions and 438 deletions

View File

@@ -22,6 +22,7 @@ npm start # Start website dev server
npm run build # Build all packages
npm run build:libs # Build all libraries
npm run lint # ESLint with TypeScript strict rules
npm run zip -w @page-agent/ext # Zip the extension package
```
## Architecture
@@ -36,7 +37,7 @@ packages/
├── page-agent/ # npm: "page-agent" entry class (with UI + controller + demo builds)
├── website/ # @page-agent/website (private)
├── llms/ # @page-agent/llms
├── extension/ # 🚧 WIP: Browser extension (WXT + React)
├── extension/ # Browser extension (WXT + React)
├── page-controller/ # @page-agent/page-controller
└── ui/ # @page-agent/ui
```

View File

@@ -20,10 +20,11 @@ Thank you for your interest in contributing to Page-Agent! We welcome contributi
### Project Structure
This is a **monorepo** with npm workspaces containing **3 main packages**:
This is a **monorepo** with npm workspaces containing **4 main packages**:
- **Page Agent** (`packages/page-agent/`) - Main entry with built-in UI Panel, published as `page-agent` on npm
- **Core** (`packages/core/`) - Core agent logic without UI (npm: `@page-agent/core`)
- **Extension** (`packages/extension/`) - Chrome extension for multi-page tasks and browser-level automation
- **Website** (`packages/website/`) - React documentation and landing page. Also as demo and test page for the core lib. private package `@page-agent/website`
We use a simplified monorepo solution with `native npm-workspace + ts reference + vite alias`. No fancy tooling. Hoisting is required.
@@ -145,6 +146,16 @@ If your lame AI assistant does not support [AGENTS.md](https://agents.md/). Add
npm start
```
### Extension Development
```bash
npm run dev -w @page-agent/ext
npm run zip -w @page-agent/ext
```
- Load extension in Chrome via `chrome://extensions` -> **Load unpacked**
- Use `packages/extension/docs/extension_api.md` (EN) or `packages/extension/docs/extension_api_zh.md` (ZH) for API integration details
### Testing on Other Websites
- Start and serve a local `iife` script
@@ -193,14 +204,6 @@ By contributing to this project, you agree that your contributions will be licen
> You may need to sign a github CLA before you create a PR.
### Browser-Use Attribution
Parts of this project are derived from the [browser-use](https://github.com/browser-use/browser-use) project (MIT License). When contributing to DOM-related functionality:
- Maintain existing attribution comments
- Follow similar patterns for consistency
- Credit browser-use for derived concepts
## 💬 Questions?
- Open a GitHub issue for technical questions

View File

@@ -1,4 +1,4 @@
# PageAgent 🤖🪄
# Page Agent
<picture>
<source media="(prefers-color-scheme: dark)" srcset="https://img.alicdn.com/imgextra/i4/O1CN01qKig1P1FnhpFKNdi6_!!6000000000532-2-tps-1280-256.png">
@@ -20,17 +20,15 @@
## ✨ Features
- **🎯 轻松集成**
- 无需 Python无需无头浏览器无需浏览器插件。纯页面内脚本
- **🔐 端侧运行**
- **🧠 HTML 脱水**
- **💬 自然语言接口**
- **🎨 HITL 交互界面**
以及 😉
- **🧪 实验性的 Chrome 扩展,支持跨页面控制** - `packages/extension`
👉 [**🗺️ Roadmap**](https://github.com/alibaba/page-agent/issues/96)
- 无需 `浏览器插件` / `Python` / `无头浏览器`
- 纯页面内 JavaScript一切都在你的网页中完成。
- The best tool for your agent to control web pages.
- **📖 基于文本的 DOM 操作**
- 无需截图,无需 OCR 或多模态模型。
- 无需特殊权限。
- **🧠 用你自己的 LLM**
- **🎨 精美 UI支持人机协同**
- **🐙 可选的 [Chrome 扩展](https://alibaba.github.io/page-agent/#/docs/features/chrome-extension),支持跨页面任务。**
## 🚀 快速开始
@@ -39,20 +37,16 @@
通过我们免费的 Demo LLM 快速体验 PageAgent
```html
<script
src="https://registry.npmmirror.com/page-agent/1.2.0/files/dist/iife/page-agent.demo.js"
crossorigin="true"
></script>
<script src="{URL}" crossorigin="true"></script>
```
> - **⚠️ 仅用于技术评估。** Demo LLM 有速率和使用限制,可能随时变更。
> - **🌷 建议使用自己的 LLM API。**
| Mirrors | URL |
| ------- | ---------------------------------------------------------------------------------- |
| Global | https://cdn.jsdelivr.net/npm/page-agent@1.2.0/dist/iife/page-agent.demo.js |
| China | https://registry.npmmirror.com/page-agent/1.2.0/files/dist/iife/page-agent.demo.js |
> **⚠️ 仅用于技术评估。** Demo LLM 有速率和使用限制,速度较慢,可能随时变更。
### NPM 安装
```bash
@@ -72,7 +66,7 @@ const agent = new PageAgent({
await agent.execute('点击登录按钮')
```
适用于无法使用 NPM 的环境,我们也提供了 IIFE 构建的 CDN 方式。[@see CDN Usage](https://alibaba.github.io/page-agent/#/docs/integration/cdn-setup)
更多编程用法,请参阅 [📖 文档](https://alibaba.github.io/page-agent/#/docs/introduction/overview)
## 🏗️ 架构设计
@@ -80,12 +74,13 @@ PageAgent adopts a simplified monorepo structure:
```
packages/
├── core/ # ** Core agent logic without UI(npm: @page-agent/core) **
├── page-agent/ # Exported agent and demo(npm: page-agent)
├── core/ # ** Core agent logic (npm: @page-agent/core) **
├── llms/ # LLM 客户端 (npm: @page-agent/llms)
├── page-controller/ # DOM 操作 & 蒙层 & 模拟鼠标 (npm: @page-agent/page-controller)
├── ui/ # 面板 & i18n (npm: @page-agent/ui)
── website/ # 文档站点
├── page-controller/ # DOM 操作 (npm: @page-agent/page-controller)
├── ui/ # 面板 UI (npm: @page-agent/ui)
── page-agent/ # 入口类 & iife 包 (npm: page-agent)
├── extension/ # Chrome 扩展,支持跨页面任务
└── website/ # 网站 & 文档站点
```
## 🤝 贡献

View File

@@ -1,4 +1,4 @@
# PageAgent 🤖🪄
# Page Agent
<picture>
<source media="(prefers-color-scheme: dark)" srcset="https://img.alicdn.com/imgextra/i4/O1CN01qKig1P1FnhpFKNdi6_!!6000000000532-2-tps-1280-256.png">
@@ -19,18 +19,16 @@ The GUI Agent Living in Your Webpage. Control web interfaces with natural langua
## ✨ Features
- **🎯 Easy Integration**
- No python. No headless browser. No browser extension. Just in-page scripts.
- **🔐 Client-Side Processing**
- **🧠 DOM Extraction**
- **💬 Natural Language Interface**
- **🎨 UI with Human in the loop**
And 😉
- **🧪 `cross-page` control with an experimental chrome extension** - `packages/extension`
👉 [**🗺️ Roadmap**](https://github.com/alibaba/page-agent/issues/96)
- **🎯 Easy integration**
- No need for `browser extension` / `python` / `headless browser`.
- Just in-page javascript. Everything happens in your web page.
- The best tool for your agent to control web pages.
- **📖 Text-based DOM manipulation**
- No screenshots. No OCR or multi-modal LLMs needed.
- No special permissions required.
- **🧠 Bring your own LLMs**
- **🎨 Pretty UI with human-in-the-loop**
- **🐙 Optional [chrome extension](https://alibaba.github.io/page-agent/#/docs/features/chrome-extension) for multi-page tasks.**
## 🚀 Quick Start
@@ -39,20 +37,16 @@ And 😉
Fastest way to try PageAgent with our free Demo LLM:
```html
<script
src="https://cdn.jsdelivr.net/npm/page-agent@1.2.0/dist/iife/page-agent.demo.js"
crossorigin="true"
></script>
<script src="{URL}" crossorigin="true"></script>
```
> - **⚠️ For technical evaluation only.** Demo LLM has rate limits and usage restrictions. May change without notice.
> - **🌷 Bring your own LLM API.**
| Mirrors | URL |
| ------- | ---------------------------------------------------------------------------------- |
| Global | https://cdn.jsdelivr.net/npm/page-agent@1.2.0/dist/iife/page-agent.demo.js |
| China | https://registry.npmmirror.com/page-agent/1.2.0/files/dist/iife/page-agent.demo.js |
> **⚠️ For technical evaluation only.** Demo LLM has rate limits and usage restrictions. Slow. May change without notice.
### NPM Installation
```bash
@@ -72,18 +66,21 @@ const agent = new PageAgent({
await agent.execute('Click the login button')
```
For more programmatic usage, see [📖 Documentations](https://alibaba.github.io/page-agent/#/docs/introduction/overview).
## 🏗️ Structure
PageAgent adopts a simplified monorepo structure:
```
packages/
├── core/ # ** Core agent logic without UI(npm: @page-agent/core) **
├── page-agent/ # Exported agent and demo(npm: page-agent)
├── core/ # ** Core agent logic (npm: @page-agent/core) **
├── llms/ # LLM client (npm: @page-agent/llms)
├── page-controller/ # DOM operations & Visual Mask (npm: @page-agent/page-controller)
├── ui/ # Panel & i18n (npm: @page-agent/ui)
── website/ # Demo & Documentation site
├── page-controller/ # DOM operations (npm: @page-agent/page-controller)
├── ui/ # Panel UI (npm: @page-agent/ui)
── page-agent/ # Entry class and iife builds(npm: page-agent)
├── extension/ # Chrome extension for multi-page tasks
└── website/ # Website & Documentation site
```
## 🤝 Contributing

View File

@@ -1,12 +1,18 @@
# Page Agent Extension API
This document describes how to integrate the Page Agent browser extension into your web application.
Integrate the Page Agent extension into your web app and trigger multi-page browser tasks from page JavaScript.
## Installation
### 1. Install the browser extension
Install the Page Agent extension from the Chrome Web Store.
Primary channel:
- Chrome Web Store: https://chromewebstore.google.com/detail/page-agent-ext/akldabonmimlicnjlflnapfeklbfemhj
Latest updates are often published earlier on:
- GitHub Releases: https://github.com/alibaba/page-agent/releases
### 2. Install type definitions (recommended)
@@ -14,11 +20,19 @@ Install the Page Agent extension from the Chrome Web Store.
npm install @page-agent/core --save-dev
```
### 3. Set up authentication
### 3. Authorization (Token)
The extension only injects APIs when it detects a valid token in `localStorage`.
The token allows your page JS to call the extension API (`window.PAGE_AGENT_EXT`) and execute multi-page tasks.
1. Open the extension's side panel to get your authorization token
Why token-based access is required:
- The extension has broad browser permissions (page access, navigation, multi-tab control).
- If abused, it can harm user privacy and security.
- Users must explicitly provide the token only to applications they trust.
Setup:
1. Open the extension side panel and copy your auth token.
2. Set the token in your page:
```typescript
@@ -60,36 +74,36 @@ if (await waitForExtension()) {
## Global API
The extension injects the following APIs into the `window` object:
After token match, the extension injects APIs into `window`.
### `window.PAGE_AGENT_EXT_VERSION`
Extension version string (e.g., `"1.0.0"`). This is exposed separately to allow version checking before accessing the main API object.
Extension version string (for capability checks before using the main API).
### `window.PAGE_AGENT_EXT`
Main API namespace object containing:
Main namespace object.
#### `PAGE_AGENT_EXT.execute(task, config)`
Execute an agent task.
Execute one agent task.
**Parameters:**
Parameters:
| Name | Type | Required | Description |
|------|------|----------|-------------|
| ---- | ---- | -------- | ----------- |
| `task` | `string` | Yes | Task description |
| `config` | `ExecuteConfig` | Yes | Execution configuration (LLM settings, options, and event callbacks) |
| `config` | `ExecuteConfig` | Yes | LLM settings, options, and callbacks |
**Returns:** `Promise<ExecutionResult>`
Returns: `Promise<ExecutionResult>`
#### `PAGE_AGENT_EXT.dispose()`
Stop and destroy the current running agent.
Stop the current task.
## Types
Install `@page-agent/core` for full type definitions:
Install `@page-agent/core` for complete types:
```typescript
import type {
@@ -104,10 +118,7 @@ export interface ExecuteConfig {
apiKey: string
model: string
/**
* Whether to include the initial tab (that holds this main world script) in the task.
* @default true
*/
// Include the initial tab where page JS starts. Default: true.
includeInitialTab?: boolean
onStatusChange?: (status: AgentStatus) => void
@@ -119,20 +130,13 @@ export interface ExecuteConfig {
export type Execute = (task: string, config: ExecuteConfig) => Promise<ExecutionResult>
```
### AgentStatus
`AgentStatus`
```typescript
type AgentStatus = 'idle' | 'running' | 'completed' | 'error'
```
| Status | Description |
|--------|-------------|
| `idle` | Agent is idle, ready to execute |
| `running` | Agent is executing a task |
| `completed` | Task completed successfully |
| `error` | Task failed with an error |
### AgentActivity
`AgentActivity`
```typescript
type AgentActivity =
@@ -143,15 +147,7 @@ type AgentActivity =
| { type: 'error'; message: string }
```
| Type | Description |
|------|-------------|
| `thinking` | Agent is analyzing the page and planning |
| `executing` | Agent is executing a tool action |
| `executed` | Tool execution completed |
| `retrying` | Retrying after a failure |
| `error` | An error occurred |
### HistoricalEvent
`HistoricalEvent`
```typescript
type HistoricalEvent =
@@ -162,7 +158,7 @@ type HistoricalEvent =
| { type: 'error'; message: string; rawResponse?: unknown }
```
### ExecutionResult
`ExecutionResult`
```typescript
interface ExecutionResult {
@@ -183,81 +179,22 @@ const result = await window.PAGE_AGENT_EXT!.execute(
baseURL: 'https://api.openai.com/v1',
apiKey: process.env.OPENAI_API_KEY!,
model: 'gpt-5.2',
}
)
if (result.success) {
console.log('Task completed:', result.data)
} else {
console.error('Task failed')
}
```
### Exclude Initial Tab
By default, the agent includes the initial tab (where the script runs) in the task. Set `includeInitialTab: false` to exclude it:
```typescript
const result = await window.PAGE_AGENT_EXT!.execute(
'Open a new tab and search for page-agent on GitHub',
{
baseURL: 'https://api.openai.com/v1',
apiKey: process.env.OPENAI_API_KEY!,
model: 'gpt-5.2',
includeInitialTab: false, // Agent will open new tabs only
includeInitialTab: false, // Optional: exclude current tab
onStatusChange: (status) => console.log(status),
onActivity: (activity) => console.log(activity),
}
)
```
### With Event Callbacks
### Stop the Current Task
```typescript
await window.PAGE_AGENT_EXT!.execute('Navigate to the settings page', {
baseURL: 'https://api.openai.com/v1',
apiKey: process.env.OPENAI_API_KEY!,
model: 'gpt-5.2',
onStatusChange: (status) => {
updateUI({ agentStatus: status })
},
onActivity: (activity) => {
switch (activity.type) {
case 'thinking':
showSpinner('Agent is thinking...')
break
case 'executing':
showSpinner(`Executing: ${activity.tool}`)
break
case 'executed':
log(`${activity.tool} completed in ${activity.duration}ms`)
break
case 'error':
showError(activity.message)
break
}
},
onHistoryUpdate: (history) => {
renderHistory(history)
},
})
```
### Stop Execution
```typescript
// Start a task
window.PAGE_AGENT_EXT!.execute('Scroll through all pages', {
baseURL: 'https://api.openai.com/v1',
apiKey: process.env.OPENAI_API_KEY!,
model: 'gpt-5.2',
})
// Later, stop it
window.PAGE_AGENT_EXT!.dispose()
```
## Window Type Declaration
If not using `@page-agent/core`, add this to your project:
If you are not importing `@page-agent/core`, add:
```typescript
import type {
@@ -283,7 +220,7 @@ declare global {
PAGE_AGENT_EXT_VERSION?: string
PAGE_AGENT_EXT?: {
version: string
execute: (task: string, config: ExecuteConfig) => Promise<ExecutionResult>
execute: Execute
dispose: () => void
}
}

View File

@@ -1,12 +1,18 @@
# Page Agent 浏览器插件 API
本文档介绍如何在网页应用中接入 Page Agent 浏览器插件
你的网页应用中接入 Page Agent 插件,并通过页面 JavaScript 发起多页面浏览器任务
## 安装
### 1. 安装浏览器插件
从 Chrome 应用商店安装 Page Agent 插件。
首选渠道:
- Chrome 应用商店https://chromewebstore.google.com/detail/page-agent-ext/akldabonmimlicnjlflnapfeklbfemhj
通常更快提供最新构建的渠道:
- GitHub Releaseshttps://github.com/alibaba/page-agent/releases
### 2. 安装类型定义(推荐)
@@ -14,11 +20,19 @@
npm install @page-agent/core --save-dev
```
### 3. 配置认证
### 3. 授权Token
插件在页面加载后检测 `localStorage` 中的 token匹配时才会注入 API
token 用于让页面 JS 调用扩展 API`window.PAGE_AGENT_EXT`)并执行多页面任务
1. 打开插件的侧边栏面板,获取授权 token
为什么必须使用 token
- 插件具备较广的浏览器权限(页面访问、导航、多标签控制)。
- 若被滥用,可能危害用户隐私与安全。
- 用户必须主动将 token 提供给其信任的应用。
配置步骤:
1. 在扩展侧边栏中复制 auth token。
2. 在页面中设置 token
```typescript
@@ -60,32 +74,32 @@ if (await waitForExtension()) {
## 全局 API
插件在 `window` 对象上注入以下 API
token 匹配后,插件`window` 上注入 API
### `window.PAGE_AGENT_EXT_VERSION`
插件版本号字符串(例如 `"1.0.0"`)。单独暴露版本号,方便在访问主 API 对象前进行版本检查。
插件版本号字符串,可用于在访问主 API 前做能力检查。
### `window.PAGE_AGENT_EXT`
API 命名空间对象,包含:
主命名空间对象
#### `PAGE_AGENT_EXT.execute(task, config)`
执行 Agent 任务。
**参数:**
参数:
| 名称 | 类型 | 必填 | 说明 |
|------|------|------|------|
| ---- | ---- | ---- | ---- |
| `task` | `string` | 是 | 任务描述 |
| `config` | `ExecuteConfig` | 是 | 执行配置(LLM 设置、选项和事件回调 |
| `config` | `ExecuteConfig` | 是 | LLM 设置、执行选项和回调 |
**返回:** `Promise<ExecutionResult>`
返回:`Promise<ExecutionResult>`
#### `PAGE_AGENT_EXT.dispose()`
停止并销毁当前运行的 Agent
停止当前任务
## 类型定义
@@ -104,10 +118,7 @@ export interface ExecuteConfig {
apiKey: string
model: string
/**
* 是否将初始标签页(运行此脚本的页面)包含在任务中。
* @default true
*/
// 是否包含启动脚本所在标签页。默认 true。
includeInitialTab?: boolean
onStatusChange?: (status: AgentStatus) => void
@@ -119,20 +130,13 @@ export interface ExecuteConfig {
export type Execute = (task: string, config: ExecuteConfig) => Promise<ExecutionResult>
```
### AgentStatus
`AgentStatus`
```typescript
type AgentStatus = 'idle' | 'running' | 'completed' | 'error'
```
| 状态 | 说明 |
|------|------|
| `idle` | 空闲,准备执行 |
| `running` | 正在执行任务 |
| `completed` | 任务成功完成 |
| `error` | 任务执行失败 |
### AgentActivity
`AgentActivity`
```typescript
type AgentActivity =
@@ -143,15 +147,7 @@ type AgentActivity =
| { type: 'error'; message: string }
```
| 类型 | 说明 |
|------|------|
| `thinking` | Agent 正在分析页面并规划 |
| `executing` | 正在执行工具操作 |
| `executed` | 工具执行完成 |
| `retrying` | 失败后重试 |
| `error` | 发生错误 |
### HistoricalEvent
`HistoricalEvent`
```typescript
type HistoricalEvent =
@@ -162,7 +158,7 @@ type HistoricalEvent =
| { type: 'error'; message: string; rawResponse?: unknown }
```
### ExecutionResult
`ExecutionResult`
```typescript
interface ExecutionResult {
@@ -183,81 +179,22 @@ const result = await window.PAGE_AGENT_EXT!.execute(
baseURL: 'https://api.openai.com/v1',
apiKey: process.env.OPENAI_API_KEY!,
model: 'gpt-5.2',
}
)
if (result.success) {
console.log('任务完成:', result.data)
} else {
console.error('任务失败')
}
```
### 排除初始标签页
默认情况下Agent 会将初始标签页(运行脚本的页面)包含在任务中。设置 `includeInitialTab: false` 可以排除它:
```typescript
const result = await window.PAGE_AGENT_EXT!.execute(
'打开新标签页并在 GitHub 上搜索 page-agent',
{
baseURL: 'https://api.openai.com/v1',
apiKey: process.env.OPENAI_API_KEY!,
model: 'gpt-5.2',
includeInitialTab: false, // Agent 只会打开新标签页
includeInitialTab: false, // 可选:排除当前标签页
onStatusChange: (status) => console.log(status),
onActivity: (activity) => console.log(activity),
}
)
```
### 使用事件回调
### 停止当前任务
```typescript
await window.PAGE_AGENT_EXT!.execute('导航到设置页面', {
baseURL: 'https://api.openai.com/v1',
apiKey: process.env.OPENAI_API_KEY!,
model: 'gpt-5.2',
onStatusChange: (status) => {
updateUI({ agentStatus: status })
},
onActivity: (activity) => {
switch (activity.type) {
case 'thinking':
showSpinner('Agent 正在思考...')
break
case 'executing':
showSpinner(`正在执行: ${activity.tool}`)
break
case 'executed':
log(`${activity.tool} 完成,耗时 ${activity.duration}ms`)
break
case 'error':
showError(activity.message)
break
}
},
onHistoryUpdate: (history) => {
renderHistory(history)
},
})
```
### 停止执行
```typescript
// 启动任务
window.PAGE_AGENT_EXT!.execute('滚动浏览所有页面', {
baseURL: 'https://api.openai.com/v1',
apiKey: process.env.OPENAI_API_KEY!,
model: 'gpt-5.2',
})
// 稍后停止
window.PAGE_AGENT_EXT!.dispose()
```
## Window 类型声明
如果不使用 `@page-agent/core`,可添加以下声明:
如果你不直接引入 `@page-agent/core`,可添加以下声明:
```typescript
import type {
@@ -283,7 +220,7 @@ declare global {
PAGE_AGENT_EXT_VERSION?: string
PAGE_AGENT_EXT?: {
version: string
execute: (task: string, config: ExecuteConfig) => Promise<ExecutionResult>
execute: Execute
dispose: () => void
}
}

View File

@@ -1,11 +1,13 @@
import { siGithub } from 'simple-icons'
import { siChromewebstore, siGithub } from 'simple-icons'
import BetaNotice from '@/components/BetaNotice'
import CodeEditor from '@/components/CodeEditor'
import { useLanguage } from '@/i18n/context'
export default function ChromeExtension() {
const { isZh } = useLanguage()
const chromeWebStoreUrl =
'https://chromewebstore.google.com/detail/page-agent-ext/akldabonmimlicnjlflnapfeklbfemhj'
const githubReleasesUrl = 'https://github.com/alibaba/page-agent/releases'
return (
<div>
@@ -13,70 +15,92 @@ export default function ChromeExtension() {
<p className="text-xl text-gray-600 dark:text-gray-300 mb-8 leading-relaxed">
{isZh
? '可选的 Chrome 扩展,解锁多页任务和第三方 API 集成。'
: 'Optional Chrome extension that unlocks multi-page tasks and third-party API integration.'}
? '可选的 Chrome 扩展。PageAgent.js 继续负责页面内自动化;扩展 API 额外提供多页面任务、浏览器级控制,以及从浏览器外部发起任务的能力。'
: 'An optional Chrome extension. PageAgent.js keeps handling in-page automation, while the extension API adds multi-page tasks, browser-level control, and tasks initiated from outside the browser.'}
</p>
<BetaNotice />
<div className="space-y-8 mt-8">
{/* Hero Section */}
<section className="p-6 bg-linear-to-r from-blue-50 to-purple-50 dark:from-blue-900/20 dark:to-purple-900/20 rounded-xl">
<div className="flex items-start gap-4">
<div>
<p className="text-gray-600 dark:text-gray-300">
{isZh
? '解锁多页任务!借助 Chrome 扩展Agent 可以跨标签页和页面导航,突破单页限制。'
: 'Unlock multi-page tasks! With the Chrome extension, your agent can navigate across tabs and pages, breaking the single-page limitation.'}
</p>
</div>
</div>
</section>
{/* Features */}
<section>
<h2 className="text-2xl font-bold mb-4">{isZh ? '核心特性' : 'Key Features'}</h2>
<div className="grid md:grid-cols-2 gap-4">
<div className="grid md:grid-cols-3 gap-4">
<div className="p-4 bg-gray-50 dark:bg-gray-800 rounded-lg">
<h3 className="font-semibold mb-2">🔓 {isZh ? '多页任务' : 'Multi-Page Tasks'}</h3>
<p className="text-gray-600 dark:text-gray-300 text-sm">
{isZh
? '跨多个页面和标签页执行任务,不再限于单页操作。'
: 'Execute tasks across multiple pages and tabs. No longer limited to single-page operations.'}
? '跨多个页面和标签页连续执行任务,不再限于单页上下文。'
: 'Run tasks across multiple pages and tabs without being limited to a single page context.'}
</p>
</div>
<div className="p-4 bg-gray-50 dark:bg-gray-800 rounded-lg">
<h3 className="font-semibold mb-2">
🔌 {isZh ? '开放第三方接口' : 'Third-Party API'}
🧭 {isZh ? '浏览器级控制' : 'Browser-Level Control'}
</h3>
<p className="text-gray-600 dark:text-gray-300 text-sm">
{isZh
? '用户授权后,你的网页、本地 Agent 或云端 Agent 都能通过扩展操作用户浏览器!'
: 'After user authorization, your webpage, local agent, or cloud agent can control the browser through the extension.'}
? '支持跨标签导航、页面切换和更完整的浏览器自动化能力。'
: 'Enable richer browser automation, including cross-tab navigation and page switching.'}
</p>
</div>
<div className="p-4 bg-gray-50 dark:bg-gray-800 rounded-lg">
<h3 className="font-semibold mb-2">
🔌 {isZh ? '开放集成接口' : 'Open Integration API'}
</h3>
<p className="text-gray-600 dark:text-gray-300 text-sm">
{isZh
? '用户主动授权后,页面 JS、本地 Agent 或云端 Agent 可通过扩展发起多页面任务。'
: 'With explicit user authorization, page JS, local agents, or cloud agents can trigger multi-page tasks through the extension.'}
</p>
</div>
</div>
</section>
{/* Download */}
{/* Install */}
<section>
<h2 className="text-2xl font-bold mb-4">{isZh ? '下载测试版' : 'Download Beta'}</h2>
<p className="text-gray-600 dark:text-gray-300 mb-4">
{isZh
? '扩展目前处于 Beta 阶段,请从 GitHub Releases 下载最新版本。'
: 'The extension is currently in beta. Download the latest version from GitHub Releases.'}
</p>
<h2 className="text-2xl font-bold mb-4">{isZh ? '获取扩展' : 'Get the Extension'}</h2>
<div className="flex flex-wrap gap-3">
<a
href="https://github.com/alibaba/page-agent/releases"
href={chromeWebStoreUrl}
target="_blank"
rel="noopener noreferrer"
className="inline-flex items-center gap-2 px-6 py-3 bg-blue-600 hover:bg-blue-700 text-white font-medium rounded-lg transition-colors"
>
<svg className="w-5 h-5" fill="currentColor" viewBox="0 0 24 24">
<path d={siChromewebstore.path} />
</svg>
{isZh ? '从 Chrome 应用商店安装' : 'Install from Chrome Web Store'}
</a>
<a
href={githubReleasesUrl}
target="_blank"
rel="noopener noreferrer"
className="inline-flex items-center gap-2 px-6 py-3 bg-gray-900 hover:bg-gray-800 dark:bg-gray-700 dark:hover:bg-gray-600 text-white font-medium rounded-lg transition-colors"
>
<svg className="w-5 h-5" fill="currentColor" viewBox="0 0 24 24">
<path d={siGithub.path} />
</svg>
{isZh ? '前往 GitHub Releases 下载' : 'Download from GitHub Releases'}
{isZh ? 'GitHub Releases(更新版本)' : 'GitHub Releases (faster updates)'}
</a>
</div>
</section>
{/* Relationship with PageAgent.js */}
<section>
<h2 className="text-2xl font-bold mb-4">
{isZh ? '与 PageAgent.js 的关系' : 'How It Relates to PageAgent.js'}
</h2>
<div className="p-5 bg-gray-50 dark:bg-gray-800 rounded-lg space-y-3 text-gray-600 dark:text-gray-300">
<p>
{isZh
? 'PageAgent.js 本身即可在页面内完成自动化。Chrome 扩展是可选的能力扩展。'
: 'PageAgent.js already works for in-page automation. The Chrome extension is optional, not a dependency.'}
</p>
<p>
{isZh
? '通过扩展,你可以执行多页面任务、控制浏览器,以及从浏览器外部(本地服务或云端服务)发起任务。'
: 'With the extension, you can perform multi-page tasks, browser-level control, and tasks triggered outside the browser (local or cloud services).'}
</p>
</div>
</section>
{/* Third-party Integration */}
@@ -86,32 +110,33 @@ export default function ChromeExtension() {
</h2>
<p className="text-gray-600 dark:text-gray-300 mb-4">
{isZh
? '用户授权后,外部应用可以调用扩展 API 来控制浏览器。'
: 'After user authorization, external applications can call the extension API to control the browser.'}
? '通过页面 JavaScript 调用 `window.PAGE_AGENT_EXT`,你的应用可以发起跨页面任务并控制浏览器行为。'
: 'By calling `window.PAGE_AGENT_EXT` from page JavaScript, your app can trigger multi-page tasks and control browser behavior.'}
</p>
{/* Auth Flow */}
<h3 className="text-xl font-semibold mb-3">{isZh ? '授权流程' : 'Authorization Flow'}</h3>
<h3 className="text-xl font-semibold mb-3">
{isZh ? '授权与安全' : 'Authorization and Security'}
</h3>
<p className="text-gray-600 dark:text-gray-300 mb-4">
{isZh
? '扩展使用基于 Token 的授权机制,扩展端和页面端必须持有匹配的 Token。'
: 'The extension uses a token-based authorization mechanism. Both extension and page must have matching tokens.'}
? '扩展权限范围较广(例如页面访问、导航、多标签控制)。若被滥用,可能危害用户隐私。为此,调用能力由 Token 保护,用户必须主动将 Token 提供给其信任的应用。'
: 'The extension has broad permissions (such as page access, navigation, and multi-tab control). If abused, it can harm user privacy. That is why access is protected by a token, and users must actively share the token only with applications they trust.'}
</p>
<CodeEditor
code={
isZh
? `// 1. 用户安装扩展并在扩展设置中配置 auth token
// 2. 你的页面读取相同的 token 并存入 localStorage
// 3. Token 匹配后,扩展会暴露 window.PAGE_AGENT_EXT 对象
? `// 1) 用户在扩展侧边栏获取 auth token
// 2) 仅在可信应用中设置该 token
// 3) token 匹配后,扩展会暴露 window.PAGE_AGENT_EXT
// ⚠️ 请在扩展弹窗中查看你的 auth token然后填入下方
// ⚠️ 不要把 token 提供给不可信页面或脚本
localStorage.setItem('PageAgentExtUserAuthToken', '<从扩展中获取的-token>')`
: `// 1. User installs extension and sets an auth token in extension settings
// 2. Your page reads the same token and stores it in localStorage
// 3. After token match, extension exposes window.PAGE_AGENT_EXT object
: `// 1) Get auth token from the extension side panel
// 2) Set it only in trusted applications
// 3) After token match, extension exposes window.PAGE_AGENT_EXT
// ⚠️ Check your extension popup for the auth token
// ⚠️ Never provide the token to untrusted pages or scripts
localStorage.setItem('PageAgentExtUserAuthToken', '<your-token-from-extension>')`
}
language="javascript"
@@ -152,13 +177,87 @@ localStorage.setItem('PageAgentExtUserAuthToken', '<your-token-from-extension>')
</div>
</section>
<h3 className="text-xl font-semibold my-3">PAGE_AGENT_EXT.execute(task, config)</h3>
{/* TypeScript Declaration */}
<h2 className="text-2xl font-bold mb-4">
{isZh ? 'TypeScript 类型声明' : 'TypeScript Declaration'}
</h2>
<p className="text-gray-600 dark:text-gray-300 mb-4">
{isZh
? '使用配置执行任务。返回一个 Promise在任务完成时 resolve。config 参数包含 LLM 设置、选项和事件回调。'
: 'Execute a task with configuration. Returns a Promise that resolves when the task completes. Config includes LLM settings, options, and event callbacks.'}
? '推荐把 `execute` 的类型声明加入你的项目,获得完整类型提示。'
: 'Add this `execute` declaration to your project for full type support.'}
</p>
<CodeEditor
code={
isZh
? `import type {
AgentActivity,
AgentStatus,
ExecutionResult,
HistoricalEvent
} from '@page-agent/core'
interface ExecuteConfig {
baseURL: string // LLM API 端点
apiKey: string // API 密钥
model: string // 模型名称
includeInitialTab?: boolean
onStatusChange?: (status: AgentStatus) => void
onActivity?: (activity: AgentActivity) => void
onHistoryUpdate?: (history: HistoricalEvent[]) => void
onDispose?: () => void
}
type Execute = (task: string, config: ExecuteConfig) => Promise<ExecutionResult>
declare global {
interface Window {
PAGE_AGENT_EXT_VERSION?: string
PAGE_AGENT_EXT?: {
version: string
execute: Execute
dispose: () => void
}
}
}`
: `import type {
AgentActivity,
AgentStatus,
ExecutionResult,
HistoricalEvent
} from '@page-agent/core'
interface ExecuteConfig {
baseURL: string // LLM API endpoint
apiKey: string // API key
model: string // Model name
includeInitialTab?: boolean
onStatusChange?: (status: AgentStatus) => void
onActivity?: (activity: AgentActivity) => void
onHistoryUpdate?: (history: HistoricalEvent[]) => void
onDispose?: () => void
}
type Execute = (task: string, config: ExecuteConfig) => Promise<ExecutionResult>
declare global {
interface Window {
PAGE_AGENT_EXT_VERSION?: string
PAGE_AGENT_EXT?: {
version: string
execute: Execute
dispose: () => void
}
}
}`
}
language="typescript"
/>
<h3 className="text-xl font-semibold mt-6 mb-3">PAGE_AGENT_EXT.execute(task, config)</h3>
<CodeEditor
code={
isZh
@@ -168,7 +267,7 @@ const result = await window.PAGE_AGENT_EXT.execute(
{
baseURL: 'https://api.openai.com/v1',
apiKey: 'your-api-key',
model: 'gpt-5-2',
model: 'gpt-5.2',
// includeInitialTab: false, // 设为 false 排除初始标签页
onStatusChange: status => console.log('状态变化:', status),
onActivity: activity => console.log('活动:', activity),
@@ -184,7 +283,7 @@ const result = await window.PAGE_AGENT_EXT.execute(
{
baseURL: 'https://api.openai.com/v1',
apiKey: 'your-api-key',
model: 'gpt-5-2',
model: 'gpt-5.2',
// includeInitialTab: false, // Set to false to exclude initial tab
onStatusChange: status => console.log('Status change:', status),
onActivity: activity => console.log('Activity:', activity),
@@ -217,100 +316,17 @@ window.PAGE_AGENT_EXT.dispose()`
/>
</section>
{/* ExecuteConfig */}
<section>
<h2 className="text-2xl font-bold mb-4">{isZh ? '执行配置' : 'Execute Configuration'}</h2>
<p className="text-gray-600 dark:text-gray-300 mb-4">
{isZh
? 'config 参数包含 LLM 设置、选项和事件回调,用于控制任务执行行为。'
: 'The config parameter includes LLM settings, options, and event callbacks to control task execution behavior.'}
</p>
<CodeEditor
code={
isZh
? `interface ExecuteConfig {
baseURL: string // LLM API 端点
apiKey: string // API 密钥
model: string // 模型名称
// 是否将初始标签页包含在任务中,默认 true
includeInitialTab?: boolean
// Agent 状态变化时调用idle, running, error, completed 等)
onStatusChange?: (status: AgentStatus) => void
// Agent 执行活动时调用(如点击、输入、导航等操作)
onActivity?: (activity: AgentActivity) => void
// 历史记录更新时调用(包含完整的事件历史)
onHistoryUpdate?: (history: HistoricalEvent[]) => void
// Agent 被停止时调用
onDispose?: () => void
}`
: `interface ExecuteConfig {
baseURL: string // LLM API endpoint
apiKey: string // API key
model: string // Model name
// Whether to include the initial tab in the task, default true
includeInitialTab?: boolean
// Called when agent status changes (idle, running, error, completed, etc.)
onStatusChange?: (status: AgentStatus) => void
// Called when agent performs an activity (click, input, navigation, etc.)
onActivity?: (activity: AgentActivity) => void
// Called when history is updated (contains full event history)
onHistoryUpdate?: (history: HistoricalEvent[]) => void
// Called when agent is disposed
onDispose?: () => void
}`
}
language="typescript"
/>
</section>
{/* Security Notice */}
<section className="p-4 bg-yellow-50 dark:bg-yellow-900/20 rounded-lg">
<h3 className="text-lg font-semibold text-yellow-900 dark:text-yellow-300 mb-2">
{isZh ? '安全须知' : 'Security Notes'}
</h3>
<ul className="text-gray-600 dark:text-gray-300 space-y-1 text-sm">
<li>
{' '}
{isZh
? '用户必须在扩展设置中显式授权每个域名'
: 'Users must explicitly authorize each domain in extension settings'}
</li>
<li>
{' '}
{isZh
? '生产环境建议使用后端代理 LLM API Key'
: 'Consider using backend proxy for LLM API keys in production'}
</li>
</ul>
</section>
{/* Integration Guide */}
<section>
<h2 className="text-2xl font-bold mb-4">
{isZh
? '将 MultiPageAgent 融入你自己的插件'
? '将 MultiPageAgent 集成你自己的插件'
: 'Integrate MultiPageAgent into Your Extension'}
</h2>
<p className="text-gray-600 dark:text-gray-300 mb-4">
{isZh
? '你可以将 MultiPageAgent 集成到自己的浏览器扩展中,实现跨页面的 AI 自动化能力。'
: 'You can integrate MultiPageAgent into your own browser extension for cross-page AI automation capabilities.'}
</p>
<p className="text-gray-600 dark:text-gray-300 mb-4">TODO</p>
<p className="text-gray-600 dark:text-gray-300 mb-4">
{isZh ? '参考源码实现:' : 'Reference implementation:'}
</p>
? '建议先阅读扩展 API 文档,再参考 background entry implementation。'
: 'Start with the extension API docs, then use the background entry implementation as a reference.'}
<a
href="https://github.com/alibaba/page-agent/blob/main/packages/extension/src/entrypoints/background.ts"
target="_blank"
@@ -322,6 +338,7 @@ window.PAGE_AGENT_EXT.dispose()`
</svg>
packages/extension/src/entrypoints/background.ts
</a>
</p>
</section>
</div>
</div>

View File

@@ -28,20 +28,5 @@
"erasableSyntaxOnly": true,
"noFallthroughCasesInSwitch": true,
"noUncheckedSideEffectImports": true
// "paths": {
// // Simplified monorepo solution (raw npm workspace with hoisting)
// "@page-agent/page-controller": ["./packages/page-controller/src/PageController.ts"],
// "page-agent": ["./packages/page-agent/src/PageAgent.ts"]
// }
}
// "references": [
// { "path": "./packages/page-controller" },
// { "path": "./packages/page-agent" },
// { "path": "./packages/website" }
// ],
// "include": ["packages/*/src/**/*.ts", "packages/*/src/**/*.tsx"],
// "exclude": ["node_modules", "dist", "packages/*/dist"]
// "files": ["env.d.ts"]
// "files": []
}

View File

@@ -1,3 +1,5 @@
// this is only for IDE ts language server to work.
// do not use this for building or linting.
{
"extends": "./tsconfig.base.json",
"references": [