Files
page-agent/packages/extension/docs/extension_api.md
2026-06-11 19:53:30 +08:00

5.9 KiB

Page Agent Extension API

Integrate the Page Agent extension into your web app and trigger multi-page browser tasks from page JavaScript.

Installation

1. Install the browser extension

Primary channel:

Latest updates are often published earlier on:

npm install @page-agent/core --save-dev

3. Authorization (Token)

The token allows your page JS to call the extension API (window.PAGE_AGENT_EXT) and execute multi-page tasks.

Why token-based access is required:

  • The extension has broad browser permissions (page access, navigation, multi-tab control).
  • If abused, it can harm user privacy and security.
  • Users must explicitly provide the token only to applications they trust.

Setup:

  1. Open the extension side panel and copy your auth token.
  2. Set the token in your page:
localStorage.setItem('PageAgentExtUserAuthToken', 'your-token')

Quick Start

import type { AgentActivity, AgentStatus, ExecutionResult, HistoricalEvent } from '@page-agent/core'

// Wait for extension injection (up to 1 second)
async function waitForExtension(timeout = 1000): Promise<boolean> {
    const start = Date.now()
    while (Date.now() - start < timeout) {
        if (window.PAGE_AGENT_EXT) return true
        await new Promise((r) => setTimeout(r, 100))
    }
    return false
}

// Usage
if (await waitForExtension()) {
    const result = await window.PAGE_AGENT_EXT!.execute('Click the login button', {
        baseURL: 'https://api.openai.com/v1',
        apiKey: 'your-api-key',
        model: 'gpt-5.2',
        onStatusChange: (status) => console.log('Status:', status),
        onActivity: (activity) => console.log('Activity:', activity),
    })
    console.log('Result:', result)
}

Global API

After token match, the extension injects APIs into window.

window.PAGE_AGENT_EXT_VERSION

Extension version string (for capability checks before using the main API).

window.PAGE_AGENT_EXT

Main namespace object.

PAGE_AGENT_EXT.execute(task, config)

Execute one agent task.

Parameters:

Name Type Required Description
task string Yes Task description
config ExecuteConfig Yes LLM settings, options, and callbacks

Returns: Promise<ExecutionResult>

PAGE_AGENT_EXT.stop()

Stop the current task.

Types

Install @page-agent/core for complete types:

import type { AgentActivity, AgentStatus, ExecutionResult, HistoricalEvent } from '@page-agent/core'

export interface ExecuteConfig {
    baseURL: string
    model: string
    apiKey?: string

    // Global system-level instructions for the agent.
    // Equivalent to AgentConfig.instructions.system.
    systemInstruction?: string

    // Include the initial tab where page JS starts. Default: true.
    includeInitialTab?: boolean

    // Control all unpinned tabs in the window instead of only the tab group.
    // When enabled, agent sees and can switch to every non-pinned tab.
    // Default: false. Experimental.
    experimentalIncludeAllTabs?: boolean

    onStatusChange?: (status: AgentStatus) => void
    onActivity?: (activity: AgentActivity) => void
    onHistoryUpdate?: (history: HistoricalEvent[]) => void
}

export type Execute = (task: string, config: ExecuteConfig) => Promise<ExecutionResult>

AgentStatus

type AgentStatus = 'idle' | 'running' | 'completed' | 'error' | 'stopped'

AgentActivity

type AgentActivity =
    | { type: 'thinking' }
    | { type: 'executing'; tool: string; input: unknown }
    | { type: 'executed'; tool: string; input: unknown; output: string; duration: number }
    | { type: 'retrying'; attempt: number; maxAttempts: number }
    | { type: 'error'; message: string }

HistoricalEvent

type HistoricalEvent =
    | { type: 'step'; stepIndex: number; reflection: AgentReflection; action: Action }
    | { type: 'observation'; content: string }
    | { type: 'user_takeover' }
    | { type: 'retry'; message: string; attempt: number; maxAttempts: number }
    | { type: 'error'; message: string; rawResponse?: unknown }

ExecutionResult

interface ExecutionResult {
    success: boolean
    data: string
    history: HistoricalEvent[]
}

Usage Examples

Basic Execution

const result = await window.PAGE_AGENT_EXT!.execute(
    'Fill in the email field with test@example.com and click Submit',
    {
        baseURL: 'https://api.openai.com/v1',
        apiKey: process.env.OPENAI_API_KEY!,
        model: 'gpt-5.2',
        includeInitialTab: false, // Optional: exclude current tab
        onStatusChange: (status) => console.log(status),
        onActivity: (activity) => console.log(activity),
    }
)

Stop the Current Task

window.PAGE_AGENT_EXT!.stop()

Window Type Declaration

If you are not importing @page-agent/core, add:

import type { AgentActivity, AgentStatus, ExecutionResult, HistoricalEvent } from '@page-agent/core'

interface ExecuteConfig {
    baseURL: string
    model: string
    apiKey?: string

    systemInstruction?: string

    includeInitialTab?: boolean
    experimentalIncludeAllTabs?: boolean
    onStatusChange?: (status: AgentStatus) => void
    onActivity?: (activity: AgentActivity) => void
    onHistoryUpdate?: (history: HistoricalEvent[]) => void
}

declare global {
    interface Window {
        PAGE_AGENT_EXT_VERSION?: string
        PAGE_AGENT_EXT?: {
            version: string
            execute: Execute
            stop: () => void
        }
    }
}