docs: update instructions. add advanced section

This commit is contained in:
Simon
2026-01-19 16:47:48 +08:00
parent b217e6a2ca
commit 3ce4d8e3fe
12 changed files with 935 additions and 223 deletions

View File

@@ -4,11 +4,12 @@
This is a **monorepo** with npm workspaces:
- **Core Library** (`packages/page-agent/`) - AI agent for browser DOM automation, published as `page-agent` on npm
- **Page Agent** (`packages/page-agent/`) - Main entry with built-in UI Panel, published as `page-agent` on npm
- **Website** (`packages/website/`) - React docs and landing page. **When working on website, follow `packages/website/AGENTS.md`**
Internal packages:
- **Core** (`packages/core/`) - PageAgentCore without UI (npm: `@page-agent/core`)
- **CDN** (`packages/cdn/`) - IIFE builds for script tag usage (npm: `@page-agent/cdn`)
- **LLMs** (`packages/llms/`) - LLM client with reflection-before-action mental model
- **Page Controller** (`packages/page-controller/`) - DOM operations and visual feedback (SimulatorMask), independent of LLM
@@ -31,7 +32,8 @@ Simple monorepo solution: TypeScript references + Vite aliases. Update tsconfig
```
packages/
├── page-agent/ # npm: "page-agent" ⭐ MAIN
├── page-agent/ # npm: "page-agent" ⭐ MAIN (with Panel UI)
├── core/ # npm: "@page-agent/core" (headless, no UI)
├── cdn/ # npm: "@page-agent/cdn" (IIFE builds)
├── website/ # @page-agent/website (private)
├── llms/ # @page-agent/llms
@@ -43,9 +45,10 @@ packages/
### Module Boundaries
- **Page Agent**: Core lib. Imports from `@page-agent/llms`, `@page-agent/page-controller`, `@page-agent/ui`
- **Page Agent**: Main entry with UI. Extends PageAgentCore and adds Panel. Imports from `@page-agent/core`, `@page-agent/ui`
- **Core**: PageAgentCore without UI. Imports from `@page-agent/llms`, `@page-agent/page-controller`
- **LLMs**: LLM client with MacroToolInput contract. No dependency on page-agent
- **UI**: Panel and i18n. No dependency on page-agent
- **UI**: Panel and i18n. Decoupled from PageAgent via PanelAgentAdapter interface
- **Page Controller**: DOM operations with optional visual feedback (SimulatorMask). No LLM dependency. Enable mask via `enableMask: true` config
### PageController ↔ PageAgent Communication
@@ -86,11 +89,19 @@ Demo build supports query params (e.g., `?model=gpt-4&lang=en-US`).
### Page Agent (`packages/page-agent/`)
| File | Description |
| ------------------ | --------------------------------------- |
| `src/PageAgent.ts` | ⭐ Main AI agent class |
| `src/umd.ts` | CDN/UMD entry with auto-init |
| `src/tools/` | Tool definitions calling PageController |
| File | Description |
| ------------------ | ---------------------------------------------- |
| `src/PageAgent.ts` | ⭐ Main class with UI, extends PageAgentCore |
| `src/iife.ts` | IIFE/CDN entry |
### Core (`packages/core/`)
| File | Description |
| ----------------------- | ------------------------------------------- |
| `src/PageAgentCore.ts` | ⭐ Core agent class without UI |
| `src/tools/` | Tool definitions calling PageController |
| `src/config/` | Configuration types and constants |
| `src/prompts/` | System prompt templates |
### LLMs (`packages/llms/`)
@@ -113,7 +124,7 @@ Demo build supports query params (e.g., `?model=gpt-4&lang=en-US`).
### New Agent Tool
1. Implement in `packages/page-agent/src/tools/index.ts`
1. Implement in `packages/core/src/tools/index.ts`
2. If tool needs DOM ops, add method to PageController first
3. Tool calls `this.pageController.methodName()` for DOM interactions

View File

@@ -20,10 +20,11 @@ Thank you for your interest in contributing to Page-Agent! We welcome contributi
### Project Structure
This is a **monorepo** with npm workspaces containing **two main packages**:
This is a **monorepo** with npm workspaces containing **3 main packages**:
1. **Core Library** (`packages/page-agent/`) - Pure JavaScript/TypeScript AI agent library for browser DOM automation, published as `page-agent` on npm
2. **Website** (`packages/website/`) - React documentation and landing page. Also as demo and test page for the core lib. private package `@page-agent/website`
- **Page Agent** (`packages/page-agent/`) - Main entry with built-in UI Panel, published as `page-agent` on npm
- **Core** (`packages/core/`) - Core agent logic without UI (npm: `@page-agent/core`)
- **Website** (`packages/website/`) - React documentation and landing page. Also as demo and test page for the core lib. private package `@page-agent/website`
We use a simplified monorepo solution with `native npm-workspace + ts reference + vite alias`. No fancy tooling. Hoisting is required.

View File

@@ -76,7 +76,8 @@ PageAgent adopts a simplified monorepo structure:
```
packages/
├── page-agent/ # AI agent (npm: page-agent)
├── page-agent/ # AI agent with UI Panel(npm: page-agent)
├── core/ # Agent core logic without UI(npm: @page-agent/core)
├── llms/ # LLM 客户端 (npm: @page-agent/llms)
├── page-controller/ # DOM 操作 & 蒙层 & 模拟鼠标 (npm: @page-agent/page-controller)
├── ui/ # 面板 & i18n (npm: @page-agent/ui)

View File

@@ -76,7 +76,8 @@ PageAgent adopts a simplified monorepo structure:
```
packages/
├── page-agent/ # AI agent (npm: page-agent)
├── page-agent/ # AI agent with UI Panel(npm: page-agent)
├── core/ # Agent core logic without UI(npm: @page-agent/core)
├── llms/ # LLM client (npm: @page-agent/llms)
├── page-controller/ # DOM operations & Visual Mask (npm: @page-agent/page-controller)
├── ui/ # Panel & i18n (npm: @page-agent/ui)

View File

@@ -0,0 +1,167 @@
/**
* API Reference component for displaying TypeScript interface definitions
*
* Provides a beautiful, readable table for documenting API interfaces
*/
import * as React from 'react'
import { cn } from '@/lib/utils'
import { Badge } from './badge'
// ============================================================================
// Types
// ============================================================================
export interface PropDefinition {
/** Property name */
name: string
/** TypeScript type (can include generics, unions, etc.) */
type: string
/** Whether the property is required */
required?: boolean
/** Default value if any */
defaultValue?: string
/** Description of the property */
description: React.ReactNode
/** Mark as experimental/deprecated */
status?: 'experimental' | 'deprecated'
}
export interface APIReferenceProps {
/** Title for the API section */
title?: string
/** Optional description */
description?: React.ReactNode
/** Property definitions */
properties: PropDefinition[]
/** Additional CSS classes */
className?: string
}
// ============================================================================
// Component
// ============================================================================
export function APIReference({ title, description, properties, className }: APIReferenceProps) {
return (
<div className={cn('my-6', className)}>
{title && (
<h3 className="text-lg font-semibold text-gray-900 dark:text-gray-100 mb-2">{title}</h3>
)}
{description && (
<p className="text-sm text-gray-600 dark:text-gray-400 mb-4">{description}</p>
)}
<div className="overflow-hidden rounded-lg border border-gray-200 dark:border-gray-700">
<table className="w-full text-sm">
<thead>
<tr className="bg-gray-50 dark:bg-gray-800/50">
<th className="px-4 py-3 text-left font-medium text-gray-600 dark:text-gray-300">
Property
</th>
<th className="px-4 py-3 text-left font-medium text-gray-600 dark:text-gray-300">
Type
</th>
<th className="px-4 py-3 text-left font-medium text-gray-600 dark:text-gray-300 hidden md:table-cell">
Default
</th>
<th className="px-4 py-3 text-left font-medium text-gray-600 dark:text-gray-300">
Description
</th>
</tr>
</thead>
<tbody className="divide-y divide-gray-100 dark:divide-gray-800">
{properties.map((prop) => (
<PropRow key={prop.name} {...prop} />
))}
</tbody>
</table>
</div>
</div>
)
}
function PropRow({ name, type, required, defaultValue, description, status }: PropDefinition) {
return (
<tr className="bg-white dark:bg-gray-900 hover:bg-gray-50 dark:hover:bg-gray-800/50 transition-colors">
{/* Property name */}
<td className="px-4 py-3 align-top">
<div className="flex items-center gap-2 flex-wrap">
<code className="font-mono text-sm font-medium text-indigo-600 dark:text-indigo-400">
{name}
</code>
{required && (
<Badge
variant="outline"
className="text-[10px] px-1.5 py-0 border-red-300 text-red-600 dark:border-red-800 dark:text-red-400"
>
required
</Badge>
)}
{status === 'experimental' && (
<Badge
variant="outline"
className="text-[10px] px-1.5 py-0 border-amber-300 text-amber-600 dark:border-amber-800 dark:text-amber-400"
>
experimental
</Badge>
)}
{status === 'deprecated' && (
<Badge
variant="outline"
className="text-[10px] px-1.5 py-0 border-gray-300 text-gray-500 dark:border-gray-700 dark:text-gray-500 line-through"
>
deprecated
</Badge>
)}
</div>
</td>
{/* Type */}
<td className="px-4 py-3 align-top">
<code className="font-mono text-xs text-gray-700 dark:text-gray-300 bg-gray-100 dark:bg-gray-800 px-1.5 py-0.5 rounded whitespace-nowrap">
{type}
</code>
</td>
{/* Default value */}
<td className="px-4 py-3 align-top hidden md:table-cell">
{defaultValue ? (
<code className="font-mono text-xs text-gray-600 dark:text-gray-400">{defaultValue}</code>
) : (
<span className="text-gray-400 dark:text-gray-600">-</span>
)}
</td>
{/* Description */}
<td className="px-4 py-3 align-top text-gray-600 dark:text-gray-400">{description}</td>
</tr>
)
}
// ============================================================================
// Utility Components
// ============================================================================
/** Code inline span for type references in descriptions */
export function TypeRef({ children }: { children: React.ReactNode }) {
return (
<code className="font-mono text-xs text-indigo-600 dark:text-indigo-400 bg-indigo-50 dark:bg-indigo-950/50 px-1 py-0.5 rounded">
{children}
</code>
)
}
/** Section divider for grouping related APIs */
export function APIDivider({ title }: { title: string }) {
return (
<div className="flex items-center gap-4 my-8">
<div className="h-px flex-1 bg-gradient-to-r from-transparent via-gray-200 dark:via-gray-700 to-transparent" />
<span className="text-xs font-medium uppercase tracking-wider text-gray-500 dark:text-gray-400">
{title}
</span>
<div className="h-px flex-1 bg-gradient-to-r from-transparent via-gray-200 dark:via-gray-700 to-transparent" />
</div>
)
}

View File

@@ -24,6 +24,7 @@ export default {
introduction: 'Introduction',
features: 'Features',
integration: 'Integration',
advanced: 'Advanced',
overview: 'Overview',
quick_start: 'Quick Start',
limitations: 'Limitations',
@@ -32,9 +33,10 @@ export default {
knowledge_injection: 'Instructions',
data_masking: 'Data Masking',
cdn_setup: 'CDN Setup',
configuration: 'Configuration',
best_practices: 'Best Practices',
third_party_agent: 'Third-party Agent',
security_permissions: 'Security & Permissions',
page_agent: 'PageAgent',
page_agent_core: 'PageAgentCore',
},
}

View File

@@ -23,6 +23,7 @@ export default {
introduction: '介绍',
features: '功能特性',
integration: '集成指南',
advanced: '高级',
overview: '概览',
quick_start: '快速开始',
limitations: '使用限制',
@@ -31,9 +32,10 @@ export default {
knowledge_injection: '知识注入',
data_masking: '数据脱敏',
cdn_setup: 'CDN 引入',
configuration: '配置选项',
best_practices: '最佳实践',
third_party_agent: '接入第三方 Agent',
security_permissions: '安全与权限',
page_agent: 'PageAgent',
page_agent_core: 'PageAgentCore',
},
}

View File

@@ -41,7 +41,6 @@ export default function DocsLayout({ children }: DocsLayoutProps) {
{
title: t('nav.integration'),
items: [
{ title: t('nav.configuration'), path: '/integration/configuration' },
{ title: t('nav.third_party_agent'), path: '/integration/third-party-agent' },
{ title: t('nav.cdn_setup'), path: '/integration/cdn-setup' },
{
@@ -51,6 +50,13 @@ export default function DocsLayout({ children }: DocsLayoutProps) {
{ title: '🚧 ' + t('nav.best_practices'), path: '/integration/best-practices' },
],
},
{
title: t('nav.advanced'),
items: [
{ title: t('nav.page_agent'), path: '/advanced/page-agent' },
{ title: t('nav.page_agent_core'), path: '/advanced/page-agent-core' },
],
},
]
return (

View File

@@ -0,0 +1,465 @@
import { useTranslation } from 'react-i18next'
import CodeEditor from '@/components/CodeEditor'
import { APIDivider, APIReference, TypeRef } from '@/components/ui/api-reference'
export default function PageAgentCoreDocs() {
const { i18n } = useTranslation()
const isZh = i18n.language === 'zh-CN'
return (
<div>
<h1 className="text-4xl font-bold mb-6">PageAgentCore</h1>
<p className="text-xl text-gray-600 dark:text-gray-300 mb-8 leading-relaxed">
{isZh
? 'PageAgentCore 是不带 UI 的核心 Agent 类。用于需要自定义 UI 或无头运行的场景。'
: 'PageAgentCore is the core Agent class without UI. Use it for custom UI or headless scenarios.'}
</p>
{/* When to use */}
<section className="mb-10">
<h2 className="text-2xl font-semibold mb-4">
{isZh ? '何时使用 PageAgentCore' : 'When to Use PageAgentCore'}
</h2>
<ul className="list-disc list-inside text-gray-600 dark:text-gray-400 space-y-2">
<li>{isZh ? '需要自定义 UI 界面' : 'Need a custom UI interface'}</li>
<li>{isZh ? '在自动化测试中无头运行' : 'Running headless in automated tests'}</li>
<li>
{isZh
? '在非浏览器环境运行(需自定义 PageController'
: 'Running in non-browser environments (requires custom PageController)'}
</li>
<li>
{isZh
? '将 PageAgent 嵌入其他 Agent 系统'
: 'Embedding PageAgent in other agent systems'}
</li>
</ul>
</section>
{/* Basic Usage */}
<section className="mb-10">
<h2 className="text-2xl font-semibold mb-4">{isZh ? '基本用法' : 'Basic Usage'}</h2>
<CodeEditor
language="typescript"
code={`import { PageAgentCore } from '@page-agent/core'
const agent = new PageAgentCore({
baseURL: 'https://api.openai.com/v1',
apiKey: 'your-api-key',
model: 'gpt-4o',
})
// Listen to events
agent.addEventListener('statuschange', () => {
console.log('Status:', agent.status)
})
agent.addEventListener('historychange', () => {
console.log('History:', agent.history)
})
agent.addEventListener('activity', (e) => {
const activity = (e as CustomEvent).detail
console.log('Activity:', activity.type)
})
// Execute task
const result = await agent.execute('Fill in the form with test data')`}
/>
</section>
<APIDivider title={isZh ? '配置' : 'Configuration'} />
{/* LLM Configuration */}
<section className="mb-10">
<h2 className="text-2xl font-semibold mb-4">LLMConfig</h2>
<p className="text-gray-600 dark:text-gray-400 mb-4">
{isZh
? '配置与大语言模型的连接参数。支持 OpenAI 兼容的 API。'
: 'Configure connection parameters for the language model. Supports OpenAI-compatible APIs.'}
</p>
<APIReference
properties={[
{
name: 'baseURL',
type: 'string',
required: true,
description: isZh
? 'LLM API 的基础 URL如 https://api.openai.com/v1'
: 'Base URL of the LLM API (e.g., https://api.openai.com/v1)',
},
{
name: 'apiKey',
type: 'string',
required: true,
description: isZh ? 'API 密钥' : 'API key for authentication',
},
{
name: 'model',
type: 'string',
required: true,
description: isZh
? '模型名称(如 gpt-4o, claude-3.5-sonnet'
: 'Model name (e.g., gpt-4o, claude-3.5-sonnet)',
},
{
name: 'temperature',
type: 'number',
defaultValue: '0',
description: isZh
? '模型温度参数,控制输出随机性'
: 'Model temperature, controls output randomness',
},
{
name: 'maxRetries',
type: 'number',
defaultValue: '3',
description: isZh ? 'API 调用失败时的最大重试次数' : 'Maximum retries on API failure',
},
{
name: 'customFetch',
type: 'typeof fetch',
description: isZh
? '自定义 fetch 函数,用于定制 headers、credentials、代理等'
: 'Custom fetch function for customizing headers, credentials, proxy, etc.',
},
]}
/>
</section>
{/* Agent Configuration */}
<section className="mb-10">
<h2 className="text-2xl font-semibold mb-4">AgentConfig</h2>
<p className="text-gray-600 dark:text-gray-400 mb-4">
{isZh
? '配置 Agent 的行为、生命周期钩子和扩展能力。'
: 'Configure agent behavior, lifecycle hooks, and extension capabilities.'}
</p>
<APIReference
properties={[
{
name: 'language',
type: "'en-US' | 'zh-CN'",
defaultValue: "'en-US'",
description: isZh ? 'Agent 输出语言' : 'Agent output language',
},
{
name: 'customTools',
type: 'Record<string, PageAgentTool | null>',
status: 'experimental',
description: isZh
? '自定义工具,可扩展或覆盖内置工具。设为 null 可移除工具。'
: 'Custom tools to extend or override built-in tools. Set to null to remove a tool.',
},
{
name: 'instructions',
type: 'InstructionsConfig',
description: isZh
? '指导 Agent 行为的指令配置'
: 'Instructions to guide agent behavior',
},
{
name: 'transformPageContent',
type: '(content: string) => string | Promise<string>',
description: isZh
? '发送给 LLM 前转换页面内容,可用于数据脱敏'
: 'Transform page content before sending to LLM, useful for data masking',
},
{
name: 'experimentalScriptExecutionTool',
type: 'boolean',
defaultValue: 'false',
status: 'experimental',
description: isZh
? '启用实验性 JavaScript 执行工具'
: 'Enable experimental JavaScript execution tool',
},
]}
/>
<h3 className="text-lg font-semibold mt-6 mb-3">InstructionsConfig</h3>
<APIReference
properties={[
{
name: 'system',
type: 'string',
description: isZh
? '全局系统级指令,应用于所有任务'
: 'Global system-level instructions, applied to all tasks',
},
{
name: 'getPageInstructions',
type: '(url: string) => string | undefined | null',
description: isZh
? '动态页面级指令回调,在每个步骤前调用'
: 'Dynamic page-level instructions callback, called before each step',
},
]}
/>
</section>
{/* Lifecycle Hooks */}
<section className="mb-10">
<h2 className="text-2xl font-semibold mb-4">{isZh ? '生命周期钩子' : 'Lifecycle Hooks'}</h2>
<APIReference
properties={[
{
name: 'onBeforeStep',
type: '(stepCnt: number) => void | Promise<void>',
description: isZh ? '每个步骤执行前调用' : 'Called before each step execution',
status: 'experimental',
},
{
name: 'onAfterStep',
type: '(history: HistoricalEvent[]) => void | Promise<void>',
description: isZh ? '每个步骤执行后调用' : 'Called after each step execution',
status: 'experimental',
},
{
name: 'onBeforeTask',
type: '() => void | Promise<void>',
description: isZh ? '任务开始前调用' : 'Called before task starts',
status: 'experimental',
},
{
name: 'onAfterTask',
type: '(result: ExecutionResult) => void | Promise<void>',
description: isZh ? '任务结束后调用' : 'Called after task ends',
status: 'experimental',
},
{
name: 'onDispose',
type: '(reason?: string) => void',
description: isZh ? 'Agent 销毁时调用' : 'Called when agent is disposed',
status: 'experimental',
},
]}
/>
</section>
{/* PageController Configuration */}
<section className="mb-10">
<h2 className="text-2xl font-semibold mb-4">PageControllerConfig</h2>
<p className="text-gray-600 dark:text-gray-400 mb-4">
{isZh
? '配置 DOM 提取、元素交互和视觉反馈。'
: 'Configure DOM extraction, element interaction, and visual feedback.'}
</p>
<APIReference
properties={[
{
name: 'pageController',
type: 'PageController',
status: 'experimental',
description: isZh
? '自定义 PageController 实例。如不提供,将创建默认实例。'
: 'Custom PageController instance. If not provided, a default one will be created.',
},
{
name: 'enableMask',
type: 'boolean',
defaultValue: 'true',
description: isZh
? '启用视觉遮罩覆盖层,阻止用户在自动化期间操作'
: 'Enable visual mask overlay that blocks user interaction during automation',
},
{
name: 'viewportExpansion',
type: 'number',
defaultValue: '0',
description: isZh
? '视口扩展像素数,-1 表示提取整个页面'
: 'Viewport expansion in pixels, -1 means extract entire page',
},
{
name: 'interactiveBlacklist',
type: '(Element | (() => Element))[]',
description: isZh ? '要排除的交互元素列表' : 'Elements to exclude from interaction',
},
{
name: 'interactiveWhitelist',
type: '(Element | (() => Element))[]',
description: isZh
? '要强制包含的交互元素列表'
: 'Elements to force include for interaction',
},
{
name: 'include_attributes',
type: 'string[]',
description: isZh
? '在 DOM 提取中包含的额外属性'
: 'Additional attributes to include in DOM extraction',
},
]}
/>
</section>
<APIDivider title={isZh ? '属性与方法' : 'Properties & Methods'} />
{/* Properties */}
<section className="mb-10">
<h2 className="text-2xl font-semibold mb-4">{isZh ? '属性' : 'Properties'}</h2>
<APIReference
properties={[
{
name: 'status',
type: "'idle' | 'running' | 'completed' | 'error'",
description: isZh ? '当前 Agent 执行状态' : 'Current agent execution status',
},
{
name: 'history',
type: 'HistoricalEvent[]',
description: isZh
? '历史事件数组,构成 Agent 的记忆'
: 'Array of historical events, forms agent memory',
},
{
name: 'task',
type: 'string',
description: isZh ? '当前正在执行的任务' : 'Current task being executed',
},
{
name: 'pageController',
type: 'PageController',
description: isZh
? 'PageController 实例,用于 DOM 操作'
: 'PageController instance for DOM operations',
},
{
name: 'tools',
type: 'Map<string, PageAgentTool>',
description: isZh ? '可用工具的 Map' : 'Map of available tools',
},
{
name: 'onAskUser',
type: '(question: string) => Promise<string>',
description: isZh
? 'Agent 需要用户输入时的回调。未设置则禁用 ask_user 工具。'
: 'Callback when agent needs user input. If not set, ask_user tool is disabled.',
},
]}
/>
</section>
{/* Methods */}
<section className="mb-10">
<h2 className="text-2xl font-semibold mb-4">{isZh ? '方法' : 'Methods'}</h2>
<APIReference
properties={[
{
name: 'execute(task: string)',
type: 'Promise<ExecutionResult>',
description: isZh
? '执行任务并返回结果。包含 success、data 和 history 字段。'
: 'Execute a task and return result. Contains success, data, and history fields.',
},
{
name: 'pushObservation(content: string)',
type: 'void',
description: isZh
? '向历史流推送一个观察事件,会在下一步时被 LLM 看到'
: 'Push an observation to history stream, will be seen by LLM in next step',
},
{
name: 'emitActivity(activity: AgentActivity)',
type: 'void',
description: isZh
? '发出活动事件用于 UI 反馈'
: 'Emit activity event for UI feedback',
},
{
name: 'dispose(reason?: string)',
type: 'void',
description: isZh
? '销毁 Agent 并清理资源'
: 'Dispose the agent and clean up resources',
},
]}
/>
</section>
{/* Events */}
<section className="mb-10">
<h2 className="text-2xl font-semibold mb-4">{isZh ? '事件' : 'Events'}</h2>
<p className="text-gray-600 dark:text-gray-400 mb-4">
{isZh ? (
<>
PageAgentCore <TypeRef>EventTarget</TypeRef>
</>
) : (
<>
PageAgentCore extends <TypeRef>EventTarget</TypeRef> and provides the following
events:
</>
)}
</p>
<APIReference
properties={[
{
name: 'statuschange',
type: 'Event',
description: isZh
? 'Agent 状态变化时触发 (idle → running → completed/error)'
: 'Fired when agent status changes (idle → running → completed/error)',
},
{
name: 'historychange',
type: 'Event',
description: isZh
? '历史事件更新时触发(持久化事件,构成 Agent 记忆)'
: 'Fired when history events are updated (persistent, part of agent memory)',
},
{
name: 'activity',
type: 'CustomEvent<AgentActivity>',
description: isZh
? '实时活动反馈(短暂状态,仅用于 UI。类型包括thinking, executing, executed, retrying, error'
: 'Real-time activity feedback (transient, UI only). Types: thinking, executing, executed, retrying, error',
},
{
name: 'dispose',
type: 'Event',
description: isZh ? 'Agent 被销毁时触发' : 'Fired when agent is disposed',
},
]}
/>
</section>
<APIDivider title={isZh ? '类型定义' : 'Type Definitions'} />
{/* ExecutionResult */}
<section className="mb-10">
<h2 className="text-2xl font-semibold mb-4">ExecutionResult</h2>
<CodeEditor
language="typescript"
code={`interface ExecutionResult {
/** Whether the task completed successfully */
success: boolean
/** Result description from the agent */
data: string
/** Full execution history */
history: HistoricalEvent[]
}`}
/>
</section>
{/* AgentActivity */}
<section className="mb-10">
<h2 className="text-2xl font-semibold mb-4">AgentActivity</h2>
<CodeEditor
language="typescript"
code={`type AgentActivity =
| { type: 'thinking' }
| { type: 'executing'; tool: string; input: unknown }
| { type: 'executed'; tool: string; input: unknown; output: string; duration: number }
| { type: 'retrying'; attempt: number; maxAttempts: number }
| { type: 'error'; message: string }`}
/>
</section>
{/* <APIDivider title={isZh ? '无头模式' : 'Headless Mode'} /> */}
</div>
)
}

View File

@@ -0,0 +1,246 @@
import { useTranslation } from 'react-i18next'
import { Link } from 'wouter'
import CodeEditor from '@/components/CodeEditor'
import { APIReference, TypeRef } from '@/components/ui/api-reference'
export default function PageAgentDocs() {
const { i18n } = useTranslation()
const isZh = i18n.language === 'zh-CN'
return (
<div>
<h1 className="text-4xl font-bold mb-6">PageAgent</h1>
<p className="text-xl text-gray-600 dark:text-gray-300 mb-8 leading-relaxed">
{isZh
? 'PageAgent 是带有内置 UI 面板的完整 Agent 类。它继承自 PageAgentCore并自动创建交互面板。'
: 'PageAgent is the complete Agent class with built-in UI panel. It extends PageAgentCore and automatically creates an interactive panel.'}
</p>
{/* When to use */}
<section className="mb-10">
<h2 className="text-2xl font-semibold mb-4">
{isZh ? '何时使用 PageAgent' : 'When to Use PageAgent'}
</h2>
<p className="text-gray-600 dark:text-gray-400 mb-4">
{isZh
? '在大多数场景下,你应该使用 PageAgent。它提供了开箱即用的完整体验'
: 'In most cases, you should use PageAgent. It provides a complete out-of-the-box experience:'}
</p>
<ul className="list-disc list-inside text-gray-600 dark:text-gray-400 space-y-2 mb-6">
<li>
{isZh
? '内置 UI 面板显示任务进度、Agent 思考过程和操作结果'
: 'Built-in UI panel showing task progress, agent thinking, and action results'}
</li>
<li>
{isZh
? '支持 ask_user 工具Agent 可以向用户提问'
: 'Supports ask_user tool for agent to ask questions to users'}
</li>
</ul>
</section>
{/* Basic Usage */}
<section className="mb-10">
<h2 className="text-2xl font-semibold mb-4">{isZh ? '基本用法' : 'Basic Usage'}</h2>
<CodeEditor
language="typescript"
code={`import { PageAgent } from 'page-agent'
const agent = new PageAgent({
// LLM Configuration (required)
baseURL: 'https://api.openai.com/v1',
apiKey: 'your-api-key',
model: 'gpt-4o',
// Optional settings
language: 'en-US',
})
// Execute a task
const result = await agent.execute('Click the login button')
console.log(result.success) // true or false
console.log(result.data) // Task result description
console.log(result.history) // Full execution history`}
/>
</section>
{/* Class Definition */}
<section className="mb-10">
<h2 className="text-2xl font-semibold mb-4">{isZh ? '类定义' : 'Class Definition'}</h2>
<CodeEditor
language="typescript"
code={`class PageAgent extends PageAgentCore {
/** The UI panel instance */
panel: Panel
constructor(config: PageAgentConfig)
}`}
/>
<p className="text-gray-600 dark:text-gray-400 mt-4">
{isZh ? (
<>
PageAgent {' '}
<Link
href="/advanced/page-agent-core"
className="text-blue-600 dark:text-blue-400 hover:underline"
>
PageAgentCore
</Link>
API PageAgentCore
</>
) : (
<>
PageAgent extends{' '}
<Link
href="/advanced/page-agent-core"
className="text-blue-600 dark:text-blue-400 hover:underline"
>
PageAgentCore
</Link>
. All core methods and events are available. See PageAgentCore docs for detailed API
reference.
</>
)}
</p>
</section>
{/* Configuration */}
<section className="mb-10">
<h2 className="text-2xl font-semibold mb-4">{isZh ? '配置' : 'Configuration'}</h2>
<p className="text-gray-600 dark:text-gray-400 mb-4">
{isZh
? 'PageAgent 使用与 PageAgentCore 相同的配置接口。'
: 'PageAgent uses the same configuration interface as PageAgentCore.'}
</p>
<p className="text-gray-600 dark:text-gray-400 mb-4">
{isZh ? (
<>
{' '}
<Link
href="/advanced/page-agent-core"
className="text-blue-600 dark:text-blue-400 hover:underline"
>
PageAgentCore
</Link>
</>
) : (
<>
See{' '}
<Link
href="/advanced/page-agent-core"
className="text-blue-600 dark:text-blue-400 hover:underline"
>
PageAgentCore configuration docs
</Link>{' '}
for complete reference.
</>
)}
</p>
</section>
{/* Panel Property */}
<section className="mb-10">
<h2 className="text-2xl font-semibold mb-4">{isZh ? 'Panel 属性' : 'Panel Property'}</h2>
<p className="text-gray-600 dark:text-gray-400 mb-4">
{isZh
? 'PageAgent 自动创建一个 Panel 实例。你可以通过 panel 属性访问它来控制 UI'
: 'PageAgent automatically creates a Panel instance. You can access it via the panel property to control the UI:'}
</p>
<APIReference
properties={[
{
name: 'panel',
type: 'Panel',
required: true,
description: isZh
? '内置的 UI 面板实例,用于显示任务进度和接收用户输入。'
: 'The built-in UI panel instance for displaying task progress and receiving user input.',
},
]}
/>
<h3 className="text-lg font-semibold mt-6 mb-3">{isZh ? 'Panel 方法' : 'Panel Methods'}</h3>
<CodeEditor
language="typescript"
code={`// Show/hide the panel
agent.panel.show()
agent.panel.hide()
// Expand/collapse history view
agent.panel.expand()
agent.panel.collapse()
// Reset panel state
agent.panel.reset()
// Dispose panel (called automatically when agent disposes)
agent.panel.dispose()`}
/>
</section>
{/* Comparison with PageAgentCore */}
<section className="mb-10">
<h2 className="text-2xl font-semibold mb-4">
{isZh ? 'PageAgent vs PageAgentCore' : 'PageAgent vs PageAgentCore'}
</h2>
<div className="overflow-hidden rounded-lg border border-gray-200 dark:border-gray-700">
<table className="w-full text-sm">
<thead>
<tr className="bg-gray-50 dark:bg-gray-800/50">
<th className="px-4 py-3 text-left font-medium text-gray-600 dark:text-gray-300">
{isZh ? '特性' : 'Feature'}
</th>
<th className="px-4 py-3 text-center font-medium text-gray-600 dark:text-gray-300">
PageAgent
</th>
<th className="px-4 py-3 text-center font-medium text-gray-600 dark:text-gray-300">
PageAgentCore
</th>
</tr>
</thead>
<tbody className="divide-y divide-gray-100 dark:divide-gray-800">
<tr className="bg-white dark:bg-gray-900">
<td className="px-4 py-3 text-gray-600 dark:text-gray-400">
{isZh ? 'UI 面板' : 'UI Panel'}
</td>
<td className="px-4 py-3 text-center text-green-600 dark:text-green-400"></td>
<td className="px-4 py-3 text-center text-gray-400 dark:text-gray-600">-</td>
</tr>
<tr className="bg-white dark:bg-gray-900">
<td className="px-4 py-3 text-gray-600 dark:text-gray-400">
{isZh ? 'Headless 模式' : 'Headless Mode'}
</td>
<td className="px-4 py-3 text-center text-gray-400 dark:text-gray-600">-</td>
<td className="px-4 py-3 text-center text-green-600 dark:text-green-400"></td>
</tr>
<tr className="bg-white dark:bg-gray-900">
<td className="px-4 py-3 text-gray-600 dark:text-gray-400">
{isZh ? '自定义 PageController' : 'Custom PageController'}
</td>
<td className="px-4 py-3 text-center text-green-600 dark:text-green-400"></td>
<td className="px-4 py-3 text-center text-green-600 dark:text-green-400"></td>
</tr>
<tr className="bg-white dark:bg-gray-900">
<td className="px-4 py-3 text-gray-600 dark:text-gray-400">
{isZh ? '适用场景' : 'Use Case'}
</td>
<td className="px-4 py-3 text-center text-gray-600 dark:text-gray-400">
{isZh ? '网页集成' : 'Web integration'}
</td>
<td className="px-4 py-3 text-center text-gray-600 dark:text-gray-400">
{isZh ? '自定义 UI / 无头' : 'Custom UI / Headless'}
</td>
</tr>
</tbody>
</table>
</div>
</section>
</div>
)
}

View File

@@ -3,6 +3,9 @@ import { Route, Switch } from 'wouter'
import Header from '../../components/Header'
import DocsLayout from './Layout'
import PageAgentCoreDocs from './advanced/page-agent-core/page'
// Advanced
import PageAgentDocs from './advanced/page-agent/page'
import Instructions from './features/custom-instructions/page'
// Features
import CustomTools from './features/custom-tools/page'
@@ -11,7 +14,6 @@ import Models from './features/models/page'
import BestPractices from './integration/best-practices/page'
// Integration
import CdnSetup from './integration/cdn-setup/page'
import Configuration from './integration/configuration/page'
import SecurityPermissions from './integration/security-permissions/page'
import ThirdPartyAgent from './integration/third-party-agent/page'
import Limitations from './introduction/limitations/page'
@@ -83,11 +85,6 @@ export default function DocsRouter() {
<SecurityPermissions />
</DocsPage>
</Route>
<Route path="/integration/configuration">
<DocsPage>
<Configuration />
</DocsPage>
</Route>
<Route path="/integration/best-practices">
<DocsPage>
<BestPractices />
@@ -99,6 +96,18 @@ export default function DocsRouter() {
</DocsPage>
</Route>
{/* Advanced */}
<Route path="/advanced/page-agent">
<DocsPage>
<PageAgentDocs />
</DocsPage>
</Route>
<Route path="/advanced/page-agent-core">
<DocsPage>
<PageAgentCoreDocs />
</DocsPage>
</Route>
{/* Default redirect or 404 */}
<Route path="/docs">
<DocsPage>

View File

@@ -1,199 +0,0 @@
import { useTranslation } from 'react-i18next'
import CodeEditor from '@/components/CodeEditor'
export default function Configuration() {
const { i18n } = useTranslation()
const isZh = i18n.language === 'zh-CN'
return (
<div>
<h1 className="text-4xl font-bold mb-6">{isZh ? '配置选项' : 'Configuration'}</h1>
<p className="text-xl text-gray-600 dark:text-gray-300 mb-8 leading-relaxed">
{isZh
? 'PageAgent 的完整配置接口定义。'
: 'Complete configuration interface for PageAgent.'}
</p>
{/* LLM Configuration */}
<section className="mb-10">
<h2 className="text-2xl font-semibold mb-4">{isZh ? 'LLM 配置' : 'LLM Configuration'}</h2>
<p className="text-gray-600 dark:text-gray-400 mb-4">
{isZh
? '配置与大语言模型的连接参数。'
: 'Configure connection parameters for the language model.'}
</p>
<CodeEditor
className="mb-4"
language="typescript"
code={`interface LLMConfig {
baseURL: string
apiKey: string
model: string
temperature?: number
maxRetries?: number
/**
* Custom fetch function for LLM API requests.
* Use this to customize headers, credentials, proxy, etc.
* The response should follow OpenAI API format.
*/
customFetch?: typeof globalThis.fetch
}`}
/>
</section>
{/* Agent Configuration */}
<section className="mb-10">
<h2 className="text-2xl font-semibold mb-4">
{isZh ? 'Agent 配置' : 'Agent Configuration'}
</h2>
<p className="text-gray-600 dark:text-gray-400 mb-4">
{isZh
? '配置 Agent 的行为、生命周期钩子和扩展能力。'
: 'Configure agent behavior, lifecycle hooks, and extension capabilities.'}
</p>
<CodeEditor
className="mb-4"
language="typescript"
code={`interface AgentConfig {
language?: 'en-US' | 'zh-CN'
/**
* Whether to prompt for next task after task completion
* @default true
*/
promptForNextTask?: boolean
/**
* Enable the UI panel for visual feedback and user interaction
* When disabled, the panel will not be created and all UI operations will be skipped.
* Useful for automated testing or when integrating PageAgent as a library.
* @default true
*/
enablePanel?: boolean
/**
* Enable the ask_user tool for agent to ask questions
* When disabled, the agent cannot ask user questions during execution.
* @default true
*/
enableAskUser?: boolean
/** Custom tools to extend or override built-in tools */
customTools?: Record<string, PageAgentTool | null>
/** Instructions to guide the agent's behavior */
instructions?: {
/** Global system-level instructions, applied to all tasks */
system?: string
/** Dynamic page-level instructions callback */
getPageInstructions?: (url: string) => string | undefined | null
}
// Lifecycle hooks (with \`this\` bound to PageAgent instance)
onBeforeStep?: (this: PageAgent, stepCnt: number) => Promise<void> | void
onAfterStep?: (this: PageAgent, stepCnt: number, history: HistoryEvent[]) => Promise<void> | void
onBeforeTask?: (this: PageAgent) => Promise<void> | void
onAfterTask?: (this: PageAgent, result: ExecutionResult) => Promise<void> | void
onDispose?: (this: PageAgent, reason?: string) => void
/**
* Transform page content before sending to LLM.
* Use cases: inspect extraction results, modify page info, mask sensitive data.
*/
transformPageContent?: (content: string) => Promise<string> | string
/** @experimental Enable JavaScript execution tool */
experimentalScriptExecutionTool?: boolean
}`}
/>
</section>
{/* PageController Configuration */}
<section className="mb-10">
<h2 className="text-2xl font-semibold mb-4">
{isZh ? 'PageController 配置' : 'PageController Configuration'}
</h2>
<p className="text-gray-600 dark:text-gray-400 mb-4">
{isZh
? '配置 DOM 提取、元素交互和视觉高亮的细节。'
: 'Configure DOM extraction, element interaction, and visual highlighting.'}
</p>
<CodeEditor
className="mb-4"
language="typescript"
code={`interface DomConfig {
/** Elements to exclude from interaction */
interactiveBlacklist?: (Element | (() => Element))[]
/** Elements to force include for interaction */
interactiveWhitelist?: (Element | (() => Element))[]
/** Additional attributes to include in DOM extraction */
include_attributes?: string[]
/** Highlight overlay opacity (0-1) */
highlightOpacity?: number
/** Highlight label opacity (0-1) */
highlightLabelOpacity?: number
}
interface PageControllerConfig extends DomConfig {
/** Viewport expansion in pixels */
viewportExpansion?: number
/** Enable visual mask overlay during operations (default: false) */
enableMask?: boolean
}`}
/>
</section>
{/* Complete Type */}
<section className="mb-10">
<h2 className="text-2xl font-semibold mb-4">{isZh ? '完整类型' : 'Complete Type'}</h2>
<CodeEditor
language="typescript"
code={`type PageAgentConfig = LLMConfig & AgentConfig & PageControllerConfig`}
/>
</section>
{/* Programmatic Usage Example */}
<section className="mb-10">
<h2 className="text-2xl font-semibold mb-4">
{isZh ? '程序化使用配置' : 'Programmatic Usage'}
</h2>
<p className="text-gray-600 dark:text-gray-400 mb-4">
{isZh
? '对于程序化集成场景,可以禁用 UI。'
: 'For programmatic integration, you can disable UI.'}
</p>
<CodeEditor
language="typescript"
code={`const agent = new PageAgent({
baseURL: 'https://api.openai.com/v1',
apiKey: 'your-api-key',
model: 'your-model-name',
// Disable all UI features for pure programmatic usage
enablePanel: false, // Don't create Panel UI
enableMask: false, // Don't show visual overlay (mask and pointer)
// enableAskUser is automatically disabled when enablePanel is false
// Or keep Panel but disable post-task prompts
// enablePanel: true,
// promptForNextTask: false,
})
// Pure programmatic execution
const result = await agent.execute('search for TypeScript documentation')
console.log(result.success, result.data, result.history)`}
/>
</section>
</div>
)
}