zoushiyang/page-agent

Fork 0

Files

Simon c04c02723d chore(version): bump version to 1.10.0

2026-06-15 17:31:54 +08:00

16 KiB

Raw Blame History

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[1.10.0] - 2026-06-15

Breaking Changes

Agent run lifecycle rework - stop() is now async and resolves only after the run fully settles. Run status is decoupled from task outcome: a new stopped state was added, and LLM self-reported failures now end as completed. Lifecycle hooks re-throw instead of folding errors into the result, agent errors are recorded in history, and agent.lastResult was added.

Features

Abortable JavaScript execution - execute_javascript now honors the AbortSignal.
Leaner agent prompts - Simplified the waiting-response flow and removed navigation-back instructions to reduce LLM cognitive load.
MultiPageAgent safety - Disabled ScriptExecutionTool for MultiPageAgent.

Improvements

Dark mode detection - Refined detection heuristics and made isMainContentDark less aggressive by checking html and body data attributes independently.
Extension lifecycle robustness - Drove heartbeat and running state from status changes, cleared stale activity on any non-running status, handled the stopped lifecycle state, and cleared currentTabId on TabsController.init.

Bug Fixes

Accurate wait reporting - Wait steps now report the actual wait duration.
Scroll predicates - Scroll predicates now return booleans.
Docs - Fixed the broken demo video on GitHub.

[1.9.0] - 2026-06-08

Features

Robust abort handling - Rewrote the aborting system; sync tools, loop execution, and LLM clients now correctly respect task abort signals (ctx.signal). Also decoupled AbortError from InvokeError in @page-agent/llms.
Claude Opus 4.8 support - Added support for Claude Opus 4.8 model.

Improvements

Concurrency guard - Prevented concurrent execute() calls on a single PageAgent/Core instance to avoid race conditions.
Model recommendations refresh - Updated default and tested model list recommendations.
Test coverage - Added comprehensive Vitest unit tests for the @page-agent/llms package.
Improved documentation - Added website documentation for the ctx.signal abort contract and execute() concurrency rules.

Bug Fixes

DTS bundle fix - Fixed a packaging bug where global type declarations were incorrectly bundled into .d.ts outputs.
Website sidebar fix - Normalized trailing slashes in the website's sidebar location comparison.

[1.8.2] - 2026-05-11

Features

IIFE demo control - Added showPanel and autoInit switches to the IIFE CDN script to control whether the UI panel automatically displays or initializes on load.

Improvements

Build toolchain modernization - Upgraded build infrastructure to Vite 8.

Bug Fixes

TypeScript InvokeErrorType fix - Separated the value and type space for InvokeErrorType to resolve TypeScript compilation issues.
Website chunking fix - Restored working code-splitting with manualChunks for the documentation website.

[1.8.1] - 2026-04-27

Features

GPT-5.4 & Qwen 3.6 support - Added support for gpt-5.4 and qwen3.6-max/flash in the recommended LLM list.
Custom LLM request hook - Added a transformRequestBody hook to allow custom modification of payloads before sending requests to LLM providers.

Improvements

Accessibility (a11y) enhancements - Added descriptive accessible labels to ConfigPanel input fields and icon buttons in compliance with WCAG 4.1.2.
UI polish - Improved HistoryList loading and empty states, and added helpful tooltips for actions.
Prompt caching guidance - Added website documentation for prompt caching optimization.
Build speedups - Added parallel build scripts to accelerate local development compilation.

Bug Fixes

DeepSeek tool choice fix - Disabled explicit tool_choice for DeepSeek models to avoid API compatibility errors.
MCP version advertising - MCP server now correctly advertises its package version.

[1.8.0] - 2026-04-15

Breaking Changes

TypeScript 6 & ESLint 10 upgrade - Major toolchain modernization. Upgraded the entire monorepo to TypeScript 6 and ESLint 10 with source-first monorepo resolution (library exports resolve to source files directly during local development).

Improvements

MCP security hardening - Bound the MCP HTTP + WebSocket server to localhost only.
Extension UI refinement - Made the history panel height responsive to the viewport and improved the result card readability by increasing font size.

Bug Fixes

SimulatorMask memory leak - Fixed a memory leak by ensuring the requestAnimationFrame loop is cancelled when disposing the SimulatorMask.
Autofixer format fix - Corrected the fallback action format for autoFixer when waiting.
IIFE scope protection - Fixed a name collision in IIFE builds by preventing global helper function re-declarations.

[1.7.1] - 2026-04-04

Features

Optional keepSemanticTags - Added an experimental keepSemanticTags config to preserve semantic structure in PageController output
Per-task extension system instructions - Extension ExecuteConfig now supports systemInstruction

Improvements

Smarter scroll handling - Scroll container detection and scroll direction handling are more reliable
Better accessibility-aware element detection - Interactive candidates with supported ARIA attributes and role="listitem" are recognized more accurately

Bug Fixes

Fixed iframe-origin filtering for extension postMessage listeners
Avoided a currentScript null pointer during deferred initialization

[1.7.0] - 2026-03-31

More reliable click actions - Click handling now reuses pointer coordinates, verifies targets with elementFromPoint, and behaves better on layered layouts
Better mask event handling - SimulatorMask now supports passthrough events when automation should not fully swallow input
Fixed a SimulatorMask memory leak

[1.6.3] - 2026-03-30

Features

Experimental all-tabs control - Extension can include and control all browser tabs via experimentalIncludeAllTabs

Improvements

Calmer empty state motion - Disabled the EmptyState auto-start animation in the extension UI
Cleaner extension docs - Simplified setup and tab-control documentation across the README and developer guide

Bug Fixes

Fixed new-tab detection from content scripts
Fixed tab deduplication and multi-window handling in the extension

[1.6.2] - 2026-03-25

Longer task input - The UI task input now accepts up to 1000 characters
Contributor docs refresh - Added a maintainer note and refreshed contributor-facing documentation
Fixed lint issues in the release pipeline

[1.6.1] - 2026-03-22

Internal PageController action exports - PageController actions are now exposed as internal methods for easier reuse across packages
Expanded docs - Added MCP docs and clarified project limitations and homepage details

[1.6.0] - 2026-03-21

Features

Beta MCP support - New @page-agent/mcp package lets MCP clients such as Claude Desktop and Copilot control the browser through the Page Agent extension
Better iframe handling - Same-origin iframe elements are handled more reliably during DOM extraction and actions
Extension history workflows - Users can rerun past tasks, export history sessions as JSON, and approve MCP-triggered tasks before execution

Improvements

Unified versioning across packages - The extension now follows the root workspace version. Changelog entries are no longer split into a separate extension version section
Configurable stepDelay - Agent pacing between steps is now configurable via stepDelay
Optional API key - apiKey can now be omitted for compatible deployments that do not require one
Optional named tool choice - Tool invocation can disable named tool choice for providers that behave better without it
Better rich-text input support - Improved contenteditable handling with better event dispatching and execCommand fallback for more editors
More flexible DOM extraction - includeAttributes now supports wildcards, contenteditable is included by default, and heuristically interactive elements expose more useful attributes
MiniMax model support - Added MiniMax compatibility, with the default recommendation updated to MiniMax-M2.7

Bug Fixes

Fixed Safari issues when requestIdleCallback is unavailable
Avoid throwing when webgl2 initialization fails
Improved OpenAI-compatible request patches for GPT-5.4 chat tools and MiniMax temperature/tool-call compatibility
Fixed several UI polish issues in the extension and website, including cursor and layout regressions

[1.5.1] - 2026-03-05

Breaking Changes

data-browser-use-ignore → data-page-agent-ignore - DOM ignore attribute renamed to match the project identity
Config types restructured - PageAgentConfig split into AgentConfig + PageAgentCoreConfig; config definitions moved from config/index.ts to types.ts
Zod v3/v4 dual support - Libraries now accept both zod@^3.25 and zod@^4.0 as peer dependencies

Features

Experimental llms.txt support - Agent can fetch and include a site's llms.txt in context. Enable via experimentalLlmsTxt: true

Improvements

Default maxSteps changed from 20 to 40 for better for complex tasks out of the box
Added 400ms wait between agent steps for page reactions
Increased click wait time (100ms → 200ms) for more reliable interactions
Removed debug console.log statements from scroll actions
Reset observations on new task start
Improved logging across packages

Extension v0.1.9

PageAgent 1.5.1

Advanced config panel - New collapsible section exposing Max Steps, System Instruction, and experimental llms.txt toggle
Streamlined User Auth Token description
Moved testing API notice below auth token section

[1.4.0] - 2026-02-27

Features

Update Terms of Use and Privacy Policy
Robust tool-call validation - Action inputs are now validated against tool schemas individually, producing clear error messages (e.g. Invalid input for action "click_element_by_index") instead of unreadable union parse errors
Primitive action input coercion - Small models that output {"click_element_by_index": 2} instead of {"click_element_by_index": {"index": 2}} are now auto-corrected using tool schemas
Qwen model updates - Added qwen3.5-plus as the default free testing model; disabled enable_thinking for Qwen models to avoid incompatible responses
Updated default LLM endpoint - Migrated demo and extension to a new testing endpoint with legacy endpoint auto-migration

Improvements

Unified zod imports (* as z) across all packages for consistency
Better Zod error formatting with z.prettifyError() in LLM client
Exported InvokeError and InvokeErrorType as values (not just types) from @page-agent/llms
Exported SupportedLanguage type from @page-agent/core

Extension v0.1.8

Language setting - Added language selector (System / English / 中文) in config panel
UI makeover - New empty state with breathing glow and typing animation; ai-motion glow overlay while running; refined focus styles
Testing endpoint notice - Shows terms of use notice when using the free testing API
Legacy endpoint migration - Auto-migrates old Supabase testing endpoint to new endpoint on startup

[1.3.0] - 2026-02-13

Breaking Changes

Lifecycle: stop() vs dispose() - New stop() method to cancel the current task while keeping the agent reusable. dispose() is now terminal — a disposed agent cannot be reused. This affects both PageAgentCore and PanelAgentAdapter.

Features

Panel action button - The panel button now morphs between Stop (■) and Close (X) based on agent status
Error history - Errors and max-step failures are now recorded in history as AgentErrorEvent, making post-task analysis more complete

Bug Fixes

AbortError handling - AbortError is no longer retried by the LLM client, and shows a clean "Task stopped" message instead of a raw error stack

[1.2.0] - 2026-02-11

Features

Observe Phase - Agent now observes the page before each action, improving decision accuracy on dynamic pages
Better Abort Handling - Improved abortSignal support for cleaner task cancellation

Improvements

Pruned system prompts for lower token usage and faster responses
Improved error handling during agent steps with better error messages
Zod tree-shaking for smaller bundle size

Bug Fixes

Fixed indentation lost in DOM extraction caused by trimLines
Fixed gpt-5-mini temperature configuration

[1.1.0] - 2026-02-02

Features

Custom System Prompt - New systemPrompt config option to customize or extend the default system prompt
Chrome Extension - Extension with multi-tab control, main-world API with token auth, and tab lifecycle management

Improvements

Renamed include_attributes to includeAttributes in PageController config (camelCase consistency)
Lazy-loaded mask module for faster initialization
Better date formatting and error messages from LLM client
Added rawRequest to step history for easier debugging

Bug Fixes

Fixed CSP errors by using local SVGs for cursor mask instead of inline styles
Fixed AbortError being incorrectly retried and shown to users
Fixed mask not working correctly when starting a new task after stopping a previous one

[1.0.0] - 2026-01-19

🎉 First Stable Release

PageAgent is now ready for production use. The API is stable and breaking changes will follow semantic versioning.

Features

Core

PageAgent - Main entry class with built-in UI Panel
PageAgentCore - Headless agent class for custom UI or programmatic use
DOM Analysis - Text-based DOM extraction with high-intensity dehydration
LLM Support - Works with OpenAI, Claude, DeepSeek, Qwen, and other OpenAI-compatible APIs
Tool System - Built-in tools for click, input, scroll, select, and more
Custom Tools - Extend agent capabilities with your own tools (experimental)
Lifecycle Hooks - Hook into agent execution (experimental)
Instructions System - System-level and page-level instructions to guide agent behavior
Data Masking - Transform page content before sending to LLM

Page Controller

Element Interactions - Click, input text, select options, scroll
Visual Mask - Blocks user interaction during automation
DOM Tree Extraction - Efficient page structure extraction for LLM consumption

UI

Interactive Panel - Real-time task progress and agent thinking display
Ask User Tool - Agent can ask users for clarification
i18n Support - English and Chinese localization

Packages

Package	Description
`page-agent`	Main entry with UI Panel
`@page-agent/core`	Core agent logic without UI
`@page-agent/llms`	LLM client with retry logic
`@page-agent/page-controller`	DOM operations and visual feedback
`@page-agent/ui`	Panel and i18n

Known Limitations

Single-page application only (cannot navigate across pages)
No visual recognition (relies on DOM structure)
Limited interaction support (no hover, drag-drop, canvas operations)
See Limitations for details

Acknowledgments

This project builds upon the excellent work of browser-use. DOM processing components and prompts are adapted from browser-use (MIT License).

16 KiB Raw Blame History

Changelog

[1.10.0] - 2026-06-15

Breaking Changes

Features

Improvements

Bug Fixes

[1.9.0] - 2026-06-08

Features

Improvements

Bug Fixes

[1.8.2] - 2026-05-11

Features

Improvements

Bug Fixes

[1.8.1] - 2026-04-27

Features

Improvements

Bug Fixes

[1.8.0] - 2026-04-15

Breaking Changes

Improvements

Bug Fixes

[1.7.1] - 2026-04-04

Features

Improvements

Bug Fixes

[1.7.0] - 2026-03-31

[1.6.3] - 2026-03-30

Features

Improvements

Bug Fixes

[1.6.2] - 2026-03-25

[1.6.1] - 2026-03-22

[1.6.0] - 2026-03-21

Features

Improvements

Bug Fixes

[1.5.1] - 2026-03-05

Breaking Changes

Features

Improvements

Extension v0.1.9

[1.4.0] - 2026-02-27

Features

Improvements

Extension v0.1.8

[1.3.0] - 2026-02-13

Breaking Changes

Features

Bug Fixes

[1.2.0] - 2026-02-11

Features

Improvements

Bug Fixes

[1.1.0] - 2026-02-02

Features

Improvements

Bug Fixes

[1.0.0] - 2026-01-19

🎉 First Stable Release

Features

Core

Page Controller

UI

Packages

Known Limitations

Acknowledgments

16 KiB

Raw Blame History