docs: update extension related docs

This commit is contained in:
Simon
2026-02-12 17:19:14 +08:00
parent 11d66f42c4
commit f19b3cc2cc
9 changed files with 312 additions and 438 deletions

View File

@@ -1,12 +1,18 @@
# Page Agent Extension API
This document describes how to integrate the Page Agent browser extension into your web application.
Integrate the Page Agent extension into your web app and trigger multi-page browser tasks from page JavaScript.
## Installation
### 1. Install the browser extension
Install the Page Agent extension from the Chrome Web Store.
Primary channel:
- Chrome Web Store: https://chromewebstore.google.com/detail/page-agent-ext/akldabonmimlicnjlflnapfeklbfemhj
Latest updates are often published earlier on:
- GitHub Releases: https://github.com/alibaba/page-agent/releases
### 2. Install type definitions (recommended)
@@ -14,11 +20,19 @@ Install the Page Agent extension from the Chrome Web Store.
npm install @page-agent/core --save-dev
```
### 3. Set up authentication
### 3. Authorization (Token)
The extension only injects APIs when it detects a valid token in `localStorage`.
The token allows your page JS to call the extension API (`window.PAGE_AGENT_EXT`) and execute multi-page tasks.
1. Open the extension's side panel to get your authorization token
Why token-based access is required:
- The extension has broad browser permissions (page access, navigation, multi-tab control).
- If abused, it can harm user privacy and security.
- Users must explicitly provide the token only to applications they trust.
Setup:
1. Open the extension side panel and copy your auth token.
2. Set the token in your page:
```typescript
@@ -60,36 +74,36 @@ if (await waitForExtension()) {
## Global API
The extension injects the following APIs into the `window` object:
After token match, the extension injects APIs into `window`.
### `window.PAGE_AGENT_EXT_VERSION`
Extension version string (e.g., `"1.0.0"`). This is exposed separately to allow version checking before accessing the main API object.
Extension version string (for capability checks before using the main API).
### `window.PAGE_AGENT_EXT`
Main API namespace object containing:
Main namespace object.
#### `PAGE_AGENT_EXT.execute(task, config)`
Execute an agent task.
Execute one agent task.
**Parameters:**
Parameters:
| Name | Type | Required | Description |
|------|------|----------|-------------|
| ---- | ---- | -------- | ----------- |
| `task` | `string` | Yes | Task description |
| `config` | `ExecuteConfig` | Yes | Execution configuration (LLM settings, options, and event callbacks) |
| `config` | `ExecuteConfig` | Yes | LLM settings, options, and callbacks |
**Returns:** `Promise<ExecutionResult>`
Returns: `Promise<ExecutionResult>`
#### `PAGE_AGENT_EXT.dispose()`
Stop and destroy the current running agent.
Stop the current task.
## Types
Install `@page-agent/core` for full type definitions:
Install `@page-agent/core` for complete types:
```typescript
import type {
@@ -104,10 +118,7 @@ export interface ExecuteConfig {
apiKey: string
model: string
/**
* Whether to include the initial tab (that holds this main world script) in the task.
* @default true
*/
// Include the initial tab where page JS starts. Default: true.
includeInitialTab?: boolean
onStatusChange?: (status: AgentStatus) => void
@@ -119,20 +130,13 @@ export interface ExecuteConfig {
export type Execute = (task: string, config: ExecuteConfig) => Promise<ExecutionResult>
```
### AgentStatus
`AgentStatus`
```typescript
type AgentStatus = 'idle' | 'running' | 'completed' | 'error'
```
| Status | Description |
|--------|-------------|
| `idle` | Agent is idle, ready to execute |
| `running` | Agent is executing a task |
| `completed` | Task completed successfully |
| `error` | Task failed with an error |
### AgentActivity
`AgentActivity`
```typescript
type AgentActivity =
@@ -143,15 +147,7 @@ type AgentActivity =
| { type: 'error'; message: string }
```
| Type | Description |
|------|-------------|
| `thinking` | Agent is analyzing the page and planning |
| `executing` | Agent is executing a tool action |
| `executed` | Tool execution completed |
| `retrying` | Retrying after a failure |
| `error` | An error occurred |
### HistoricalEvent
`HistoricalEvent`
```typescript
type HistoricalEvent =
@@ -162,7 +158,7 @@ type HistoricalEvent =
| { type: 'error'; message: string; rawResponse?: unknown }
```
### ExecutionResult
`ExecutionResult`
```typescript
interface ExecutionResult {
@@ -183,81 +179,22 @@ const result = await window.PAGE_AGENT_EXT!.execute(
baseURL: 'https://api.openai.com/v1',
apiKey: process.env.OPENAI_API_KEY!,
model: 'gpt-5.2',
}
)
if (result.success) {
console.log('Task completed:', result.data)
} else {
console.error('Task failed')
}
```
### Exclude Initial Tab
By default, the agent includes the initial tab (where the script runs) in the task. Set `includeInitialTab: false` to exclude it:
```typescript
const result = await window.PAGE_AGENT_EXT!.execute(
'Open a new tab and search for page-agent on GitHub',
{
baseURL: 'https://api.openai.com/v1',
apiKey: process.env.OPENAI_API_KEY!,
model: 'gpt-5.2',
includeInitialTab: false, // Agent will open new tabs only
includeInitialTab: false, // Optional: exclude current tab
onStatusChange: (status) => console.log(status),
onActivity: (activity) => console.log(activity),
}
)
```
### With Event Callbacks
### Stop the Current Task
```typescript
await window.PAGE_AGENT_EXT!.execute('Navigate to the settings page', {
baseURL: 'https://api.openai.com/v1',
apiKey: process.env.OPENAI_API_KEY!,
model: 'gpt-5.2',
onStatusChange: (status) => {
updateUI({ agentStatus: status })
},
onActivity: (activity) => {
switch (activity.type) {
case 'thinking':
showSpinner('Agent is thinking...')
break
case 'executing':
showSpinner(`Executing: ${activity.tool}`)
break
case 'executed':
log(`${activity.tool} completed in ${activity.duration}ms`)
break
case 'error':
showError(activity.message)
break
}
},
onHistoryUpdate: (history) => {
renderHistory(history)
},
})
```
### Stop Execution
```typescript
// Start a task
window.PAGE_AGENT_EXT!.execute('Scroll through all pages', {
baseURL: 'https://api.openai.com/v1',
apiKey: process.env.OPENAI_API_KEY!,
model: 'gpt-5.2',
})
// Later, stop it
window.PAGE_AGENT_EXT!.dispose()
```
## Window Type Declaration
If not using `@page-agent/core`, add this to your project:
If you are not importing `@page-agent/core`, add:
```typescript
import type {
@@ -283,7 +220,7 @@ declare global {
PAGE_AGENT_EXT_VERSION?: string
PAGE_AGENT_EXT?: {
version: string
execute: (task: string, config: ExecuteConfig) => Promise<ExecutionResult>
execute: Execute
dispose: () => void
}
}

View File

@@ -1,12 +1,18 @@
# Page Agent 浏览器插件 API
本文档介绍如何在网页应用中接入 Page Agent 浏览器插件
你的网页应用中接入 Page Agent 插件,并通过页面 JavaScript 发起多页面浏览器任务
## 安装
### 1. 安装浏览器插件
从 Chrome 应用商店安装 Page Agent 插件。
首选渠道:
- Chrome 应用商店https://chromewebstore.google.com/detail/page-agent-ext/akldabonmimlicnjlflnapfeklbfemhj
通常更快提供最新构建的渠道:
- GitHub Releaseshttps://github.com/alibaba/page-agent/releases
### 2. 安装类型定义(推荐)
@@ -14,11 +20,19 @@
npm install @page-agent/core --save-dev
```
### 3. 配置认证
### 3. 授权Token
插件在页面加载后检测 `localStorage` 中的 token匹配时才会注入 API
token 用于让页面 JS 调用扩展 API`window.PAGE_AGENT_EXT`)并执行多页面任务
1. 打开插件的侧边栏面板,获取授权 token
为什么必须使用 token
- 插件具备较广的浏览器权限(页面访问、导航、多标签控制)。
- 若被滥用,可能危害用户隐私与安全。
- 用户必须主动将 token 提供给其信任的应用。
配置步骤:
1. 在扩展侧边栏中复制 auth token。
2. 在页面中设置 token
```typescript
@@ -60,32 +74,32 @@ if (await waitForExtension()) {
## 全局 API
插件在 `window` 对象上注入以下 API
token 匹配后,插件`window` 上注入 API
### `window.PAGE_AGENT_EXT_VERSION`
插件版本号字符串(例如 `"1.0.0"`)。单独暴露版本号,方便在访问主 API 对象前进行版本检查。
插件版本号字符串,可用于在访问主 API 前做能力检查。
### `window.PAGE_AGENT_EXT`
API 命名空间对象,包含:
主命名空间对象
#### `PAGE_AGENT_EXT.execute(task, config)`
执行 Agent 任务。
**参数:**
参数:
| 名称 | 类型 | 必填 | 说明 |
|------|------|------|------|
| ---- | ---- | ---- | ---- |
| `task` | `string` | 是 | 任务描述 |
| `config` | `ExecuteConfig` | 是 | 执行配置(LLM 设置、选项和事件回调 |
| `config` | `ExecuteConfig` | 是 | LLM 设置、执行选项和回调 |
**返回:** `Promise<ExecutionResult>`
返回:`Promise<ExecutionResult>`
#### `PAGE_AGENT_EXT.dispose()`
停止并销毁当前运行的 Agent
停止当前任务
## 类型定义
@@ -104,10 +118,7 @@ export interface ExecuteConfig {
apiKey: string
model: string
/**
* 是否将初始标签页(运行此脚本的页面)包含在任务中。
* @default true
*/
// 是否包含启动脚本所在标签页。默认 true。
includeInitialTab?: boolean
onStatusChange?: (status: AgentStatus) => void
@@ -119,20 +130,13 @@ export interface ExecuteConfig {
export type Execute = (task: string, config: ExecuteConfig) => Promise<ExecutionResult>
```
### AgentStatus
`AgentStatus`
```typescript
type AgentStatus = 'idle' | 'running' | 'completed' | 'error'
```
| 状态 | 说明 |
|------|------|
| `idle` | 空闲,准备执行 |
| `running` | 正在执行任务 |
| `completed` | 任务成功完成 |
| `error` | 任务执行失败 |
### AgentActivity
`AgentActivity`
```typescript
type AgentActivity =
@@ -143,15 +147,7 @@ type AgentActivity =
| { type: 'error'; message: string }
```
| 类型 | 说明 |
|------|------|
| `thinking` | Agent 正在分析页面并规划 |
| `executing` | 正在执行工具操作 |
| `executed` | 工具执行完成 |
| `retrying` | 失败后重试 |
| `error` | 发生错误 |
### HistoricalEvent
`HistoricalEvent`
```typescript
type HistoricalEvent =
@@ -162,7 +158,7 @@ type HistoricalEvent =
| { type: 'error'; message: string; rawResponse?: unknown }
```
### ExecutionResult
`ExecutionResult`
```typescript
interface ExecutionResult {
@@ -183,81 +179,22 @@ const result = await window.PAGE_AGENT_EXT!.execute(
baseURL: 'https://api.openai.com/v1',
apiKey: process.env.OPENAI_API_KEY!,
model: 'gpt-5.2',
}
)
if (result.success) {
console.log('任务完成:', result.data)
} else {
console.error('任务失败')
}
```
### 排除初始标签页
默认情况下Agent 会将初始标签页(运行脚本的页面)包含在任务中。设置 `includeInitialTab: false` 可以排除它:
```typescript
const result = await window.PAGE_AGENT_EXT!.execute(
'打开新标签页并在 GitHub 上搜索 page-agent',
{
baseURL: 'https://api.openai.com/v1',
apiKey: process.env.OPENAI_API_KEY!,
model: 'gpt-5.2',
includeInitialTab: false, // Agent 只会打开新标签页
includeInitialTab: false, // 可选:排除当前标签页
onStatusChange: (status) => console.log(status),
onActivity: (activity) => console.log(activity),
}
)
```
### 使用事件回调
### 停止当前任务
```typescript
await window.PAGE_AGENT_EXT!.execute('导航到设置页面', {
baseURL: 'https://api.openai.com/v1',
apiKey: process.env.OPENAI_API_KEY!,
model: 'gpt-5.2',
onStatusChange: (status) => {
updateUI({ agentStatus: status })
},
onActivity: (activity) => {
switch (activity.type) {
case 'thinking':
showSpinner('Agent 正在思考...')
break
case 'executing':
showSpinner(`正在执行: ${activity.tool}`)
break
case 'executed':
log(`${activity.tool} 完成,耗时 ${activity.duration}ms`)
break
case 'error':
showError(activity.message)
break
}
},
onHistoryUpdate: (history) => {
renderHistory(history)
},
})
```
### 停止执行
```typescript
// 启动任务
window.PAGE_AGENT_EXT!.execute('滚动浏览所有页面', {
baseURL: 'https://api.openai.com/v1',
apiKey: process.env.OPENAI_API_KEY!,
model: 'gpt-5.2',
})
// 稍后停止
window.PAGE_AGENT_EXT!.dispose()
```
## Window 类型声明
如果不使用 `@page-agent/core`,可添加以下声明:
如果你不直接引入 `@page-agent/core`,可添加以下声明:
```typescript
import type {
@@ -283,7 +220,7 @@ declare global {
PAGE_AGENT_EXT_VERSION?: string
PAGE_AGENT_EXT?: {
version: string
execute: (task: string, config: ExecuteConfig) => Promise<ExecutionResult>
execute: Execute
dispose: () => void
}
}