zoushiyang/page-agent

Fork 0

Files

Simon 4dc332a32c docs: update changelog

2026-02-12 17:23:35 +08:00

4.9 KiB

Raw Blame History

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[1.2.0] - 2026-02-11

Features

Observe Phase - Agent now observes the page before each action, improving decision accuracy on dynamic pages
Better Abort Handling - Improved abortSignal support for cleaner task cancellation

Improvements

Pruned system prompts for lower token usage and faster responses
Improved error handling during agent steps with better error messages
Zod tree-shaking for smaller bundle size

Bug Fixes

Fixed indentation lost in DOM extraction caused by trimLines
Fixed gpt-5-mini temperature configuration

[1.1.0] - 2026-02-02

Features

Custom System Prompt - New systemPrompt config option to customize or extend the default system prompt
Chrome Extension - Extension with multi-tab control, main-world API with token auth, and tab lifecycle management

Improvements

Renamed include_attributes to includeAttributes in PageController config (camelCase consistency)
Lazy-loaded mask module for faster initialization
Better date formatting and error messages from LLM client
Added rawRequest to step history for easier debugging

Bug Fixes

Fixed CSP errors by using local SVGs for cursor mask instead of inline styles
Fixed AbortError being incorrectly retried and shown to users
Fixed mask not working correctly when starting a new task after stopping a previous one

[1.0.0] - 2026-01-19

🎉 First Stable Release

PageAgent is now ready for production use. The API is stable and breaking changes will follow semantic versioning.

Features

Core

PageAgent - Main entry class with built-in UI Panel
PageAgentCore - Headless agent class for custom UI or programmatic use
DOM Analysis - Text-based DOM extraction with high-intensity dehydration
LLM Support - Works with OpenAI, Claude, DeepSeek, Qwen, and other OpenAI-compatible APIs
Tool System - Built-in tools for click, input, scroll, select, and more
Custom Tools - Extend agent capabilities with your own tools (experimental)
Lifecycle Hooks - Hook into agent execution (experimental)
Instructions System - System-level and page-level instructions to guide agent behavior
Data Masking - Transform page content before sending to LLM

Page Controller

Element Interactions - Click, input text, select options, scroll
Visual Mask - Blocks user interaction during automation
DOM Tree Extraction - Efficient page structure extraction for LLM consumption

UI

Interactive Panel - Real-time task progress and agent thinking display
Ask User Tool - Agent can ask users for clarification
i18n Support - English and Chinese localization

Configuration

interface PageAgentConfig {
    // LLM Configuration (required)
    baseURL: string
    apiKey: string
    model: string
    temperature?: number
    maxRetries?: number
    customFetch?: typeof fetch

    // Agent Configuration
    language?: 'en-US' | 'zh-CN'
    maxSteps?: number // default: 20
    customTools?: Record<string, PageAgentTool> // experimental
    instructions?: InstructionsConfig
    transformPageContent?: (content: string) => string | Promise<string>
    experimentalScriptExecutionTool?: boolean // default: false

    // Lifecycle Hooks (experimental)
    onBeforeTask?: (agent, result) => void
    onAfterTask?: (agent, result) => void
    onBeforeStep?: (agent, stepCount) => void
    onAfterStep?: (agent, history) => void
    onDispose?: (agent, reason?) => void

    // Page Controller Configuration
    enableMask?: boolean // default: true
    viewportExpansion?: number
    interactiveBlacklist?: Element[]
    interactiveWhitelist?: Element[]
}

Packages

Package	Description
`page-agent`	Main entry with UI Panel
`@page-agent/core`	Core agent logic without UI
`@page-agent/llms`	LLM client with retry logic
`@page-agent/page-controller`	DOM operations and visual feedback
`@page-agent/ui`	Panel and i18n

Known Limitations

Single-page application only (cannot navigate across pages)
No visual recognition (relies on DOM structure)
Limited interaction support (no hover, drag-drop, canvas operations)
See Limitations for details

Acknowledgments

This project builds upon the excellent work of browser-use. DOM processing components and prompts are adapted from browser-use (MIT License).

4.9 KiB Raw Blame History

Changelog

[1.2.0] - 2026-02-11

Features

Improvements

Bug Fixes

[1.1.0] - 2026-02-02

Features

Improvements

Bug Fixes

[1.0.0] - 2026-01-19

🎉 First Stable Release

Features

Core

Page Controller

UI

Configuration

Packages

Known Limitations

Acknowledgments

4.9 KiB

Raw Blame History