16 KiB
16 KiB
Changelog
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
[1.10.0] - 2026-06-15
Breaking Changes
- Agent run lifecycle rework -
stop()is now async and resolves only after the run fully settles. Run status is decoupled from task outcome: a newstoppedstate was added, and LLM self-reported failures now end ascompleted. Lifecycle hooks re-throw instead of folding errors into the result, agent errors are recorded in history, andagent.lastResultwas added.
Features
- Abortable JavaScript execution -
execute_javascriptnow honors theAbortSignal. - Leaner agent prompts - Simplified the waiting-response flow and removed navigation-back instructions to reduce LLM cognitive load.
- MultiPageAgent safety - Disabled
ScriptExecutionToolforMultiPageAgent.
Improvements
- Dark mode detection - Refined detection heuristics and made
isMainContentDarkless aggressive by checkinghtmlandbodydata attributes independently. - Extension lifecycle robustness - Drove heartbeat and running state from status changes, cleared stale activity on any non-running status, handled the stopped lifecycle state, and cleared
currentTabIdonTabsController.init.
Bug Fixes
- Accurate wait reporting - Wait steps now report the actual wait duration.
- Scroll predicates - Scroll predicates now return booleans.
- Docs - Fixed the broken demo video on GitHub.
[1.9.0] - 2026-06-08
Features
- Robust abort handling - Rewrote the aborting system; sync tools, loop execution, and LLM clients now correctly respect task abort signals (
ctx.signal). Also decoupledAbortErrorfromInvokeErrorin@page-agent/llms. - Claude Opus 4.8 support - Added support for Claude Opus 4.8 model.
Improvements
- Concurrency guard - Prevented concurrent
execute()calls on a single PageAgent/Core instance to avoid race conditions. - Model recommendations refresh - Updated default and tested model list recommendations.
- Test coverage - Added comprehensive Vitest unit tests for the
@page-agent/llmspackage. - Improved documentation - Added website documentation for the
ctx.signalabort contract andexecute()concurrency rules.
Bug Fixes
- DTS bundle fix - Fixed a packaging bug where global type declarations were incorrectly bundled into
.d.tsoutputs. - Website sidebar fix - Normalized trailing slashes in the website's sidebar location comparison.
[1.8.2] - 2026-05-11
Features
- IIFE demo control - Added
showPanelandautoInitswitches to the IIFE CDN script to control whether the UI panel automatically displays or initializes on load.
Improvements
- Build toolchain modernization - Upgraded build infrastructure to Vite 8.
Bug Fixes
- TypeScript
InvokeErrorTypefix - Separated the value and type space forInvokeErrorTypeto resolve TypeScript compilation issues. - Website chunking fix - Restored working code-splitting with
manualChunksfor the documentation website.
[1.8.1] - 2026-04-27
Features
- GPT-5.4 & Qwen 3.6 support - Added support for
gpt-5.4andqwen3.6-max/flashin the recommended LLM list. - Custom LLM request hook - Added a
transformRequestBodyhook to allow custom modification of payloads before sending requests to LLM providers.
Improvements
- Accessibility (a11y) enhancements - Added descriptive accessible labels to
ConfigPanelinput fields and icon buttons in compliance with WCAG 4.1.2. - UI polish - Improved
HistoryListloading and empty states, and added helpful tooltips for actions. - Prompt caching guidance - Added website documentation for prompt caching optimization.
- Build speedups - Added parallel build scripts to accelerate local development compilation.
Bug Fixes
- DeepSeek tool choice fix - Disabled explicit
tool_choicefor DeepSeek models to avoid API compatibility errors. - MCP version advertising - MCP server now correctly advertises its package version.
[1.8.0] - 2026-04-15
Breaking Changes
- TypeScript 6 & ESLint 10 upgrade - Major toolchain modernization. Upgraded the entire monorepo to TypeScript 6 and ESLint 10 with source-first monorepo resolution (library exports resolve to source files directly during local development).
Improvements
- MCP security hardening - Bound the MCP HTTP + WebSocket server to
localhostonly. - Extension UI refinement - Made the history panel height responsive to the viewport and improved the result card readability by increasing font size.
Bug Fixes
- SimulatorMask memory leak - Fixed a memory leak by ensuring the
requestAnimationFrameloop is cancelled when disposing the SimulatorMask. - Autofixer format fix - Corrected the fallback action format for
autoFixerwhen waiting. - IIFE scope protection - Fixed a name collision in IIFE builds by preventing global helper function re-declarations.
[1.7.1] - 2026-04-04
Features
- Optional
keepSemanticTags- Added an experimentalkeepSemanticTagsconfig to preserve semantic structure in PageController output - Per-task extension system instructions - Extension
ExecuteConfignow supportssystemInstruction
Improvements
- Smarter scroll handling - Scroll container detection and scroll direction handling are more reliable
- Better accessibility-aware element detection - Interactive candidates with supported ARIA attributes and
role="listitem"are recognized more accurately
Bug Fixes
- Fixed iframe-origin filtering for extension
postMessagelisteners - Avoided a
currentScriptnull pointer during deferred initialization
[1.7.0] - 2026-03-31
- More reliable click actions - Click handling now reuses pointer coordinates, verifies targets with
elementFromPoint, and behaves better on layered layouts - Better mask event handling -
SimulatorMasknow supports passthrough events when automation should not fully swallow input - Fixed a
SimulatorMaskmemory leak
[1.6.3] - 2026-03-30
Features
- Experimental all-tabs control - Extension can include and control all browser tabs via
experimentalIncludeAllTabs
Improvements
- Calmer empty state motion - Disabled the EmptyState auto-start animation in the extension UI
- Cleaner extension docs - Simplified setup and tab-control documentation across the README and developer guide
Bug Fixes
- Fixed new-tab detection from content scripts
- Fixed tab deduplication and multi-window handling in the extension
[1.6.2] - 2026-03-25
- Longer task input - The UI task input now accepts up to 1000 characters
- Contributor docs refresh - Added a maintainer note and refreshed contributor-facing documentation
- Fixed lint issues in the release pipeline
[1.6.1] - 2026-03-22
- Internal PageController action exports - PageController actions are now exposed as internal methods for easier reuse across packages
- Expanded docs - Added MCP docs and clarified project limitations and homepage details
[1.6.0] - 2026-03-21
Features
- Beta MCP support - New
@page-agent/mcppackage lets MCP clients such as Claude Desktop and Copilot control the browser through the Page Agent extension - Better iframe handling - Same-origin iframe elements are handled more reliably during DOM extraction and actions
- Extension history workflows - Users can rerun past tasks, export history sessions as JSON, and approve MCP-triggered tasks before execution
Improvements
- Unified versioning across packages - The extension now follows the root workspace version. Changelog entries are no longer split into a separate extension version section
- Configurable
stepDelay- Agent pacing between steps is now configurable viastepDelay - Optional API key -
apiKeycan now be omitted for compatible deployments that do not require one - Optional named tool choice - Tool invocation can disable named tool choice for providers that behave better without it
- Better rich-text input support - Improved
contenteditablehandling with better event dispatching andexecCommandfallback for more editors - More flexible DOM extraction -
includeAttributesnow supports wildcards,contenteditableis included by default, and heuristically interactive elements expose more useful attributes - MiniMax model support - Added MiniMax compatibility, with the default recommendation updated to
MiniMax-M2.7
Bug Fixes
- Fixed Safari issues when
requestIdleCallbackis unavailable - Avoid throwing when
webgl2initialization fails - Improved OpenAI-compatible request patches for GPT-5.4 chat tools and MiniMax temperature/tool-call compatibility
- Fixed several UI polish issues in the extension and website, including cursor and layout regressions
[1.5.1] - 2026-03-05
Breaking Changes
data-browser-use-ignore→data-page-agent-ignore- DOM ignore attribute renamed to match the project identity- Config types restructured -
PageAgentConfigsplit intoAgentConfig+PageAgentCoreConfig; config definitions moved fromconfig/index.tstotypes.ts - Zod v3/v4 dual support - Libraries now accept both
zod@^3.25andzod@^4.0as peer dependencies
Features
- Experimental
llms.txtsupport - Agent can fetch and include a site'sllms.txtin context. Enable viaexperimentalLlmsTxt: true
Improvements
- Default
maxStepschanged from 20 to 40 for better for complex tasks out of the box - Added 400ms wait between agent steps for page reactions
- Increased click wait time (100ms → 200ms) for more reliable interactions
- Removed debug
console.logstatements from scroll actions - Reset observations on new task start
- Improved logging across packages
Extension v0.1.9
PageAgent 1.5.1
- Advanced config panel - New collapsible section exposing Max Steps, System Instruction, and experimental
llms.txttoggle - Streamlined User Auth Token description
- Moved testing API notice below auth token section
[1.4.0] - 2026-02-27
Features
- Update Terms of Use and Privacy Policy
- Robust tool-call validation - Action inputs are now validated against tool schemas individually, producing clear error messages (e.g.
Invalid input for action "click_element_by_index") instead of unreadable union parse errors - Primitive action input coercion - Small models that output
{"click_element_by_index": 2}instead of{"click_element_by_index": {"index": 2}}are now auto-corrected using tool schemas - Qwen model updates - Added
qwen3.5-plusas the default free testing model; disabledenable_thinkingfor Qwen models to avoid incompatible responses - Updated default LLM endpoint - Migrated demo and extension to a new testing endpoint with legacy endpoint auto-migration
Improvements
- Unified zod imports (
* as z) across all packages for consistency - Better Zod error formatting with
z.prettifyError()in LLM client - Exported
InvokeErrorandInvokeErrorTypeas values (not just types) from@page-agent/llms - Exported
SupportedLanguagetype from@page-agent/core
Extension v0.1.8
- Language setting - Added language selector (System / English / 中文) in config panel
- UI makeover - New empty state with breathing glow and typing animation; ai-motion glow overlay while running; refined focus styles
- Testing endpoint notice - Shows terms of use notice when using the free testing API
- Legacy endpoint migration - Auto-migrates old Supabase testing endpoint to new endpoint on startup
[1.3.0] - 2026-02-13
Breaking Changes
- Lifecycle:
stop()vsdispose()- Newstop()method to cancel the current task while keeping the agent reusable.dispose()is now terminal — a disposed agent cannot be reused. This affects bothPageAgentCoreandPanelAgentAdapter.
Features
- Panel action button - The panel button now morphs between Stop (■) and Close (X) based on agent status
- Error history - Errors and max-step failures are now recorded in
historyasAgentErrorEvent, making post-task analysis more complete
Bug Fixes
- AbortError handling -
AbortErroris no longer retried by the LLM client, and shows a clean "Task stopped" message instead of a raw error stack
[1.2.0] - 2026-02-11
Features
- Observe Phase - Agent now observes the page before each action, improving decision accuracy on dynamic pages
- Better Abort Handling - Improved
abortSignalsupport for cleaner task cancellation
Improvements
- Pruned system prompts for lower token usage and faster responses
- Improved error handling during agent steps with better error messages
- Zod tree-shaking for smaller bundle size
Bug Fixes
- Fixed indentation lost in DOM extraction caused by
trimLines - Fixed
gpt-5-minitemperature configuration
[1.1.0] - 2026-02-02
Features
- Custom System Prompt - New
systemPromptconfig option to customize or extend the default system prompt - Chrome Extension - Extension with multi-tab control, main-world API with token auth, and tab lifecycle management
Improvements
- Renamed
include_attributestoincludeAttributesin PageController config (camelCase consistency) - Lazy-loaded mask module for faster initialization
- Better date formatting and error messages from LLM client
- Added
rawRequestto step history for easier debugging
Bug Fixes
- Fixed CSP errors by using local SVGs for cursor mask instead of inline styles
- Fixed
AbortErrorbeing incorrectly retried and shown to users - Fixed mask not working correctly when starting a new task after stopping a previous one
[1.0.0] - 2026-01-19
🎉 First Stable Release
PageAgent is now ready for production use. The API is stable and breaking changes will follow semantic versioning.
Features
Core
- PageAgent - Main entry class with built-in UI Panel
- PageAgentCore - Headless agent class for custom UI or programmatic use
- DOM Analysis - Text-based DOM extraction with high-intensity dehydration
- LLM Support - Works with OpenAI, Claude, DeepSeek, Qwen, and other OpenAI-compatible APIs
- Tool System - Built-in tools for click, input, scroll, select, and more
- Custom Tools - Extend agent capabilities with your own tools (experimental)
- Lifecycle Hooks - Hook into agent execution (experimental)
- Instructions System - System-level and page-level instructions to guide agent behavior
- Data Masking - Transform page content before sending to LLM
Page Controller
- Element Interactions - Click, input text, select options, scroll
- Visual Mask - Blocks user interaction during automation
- DOM Tree Extraction - Efficient page structure extraction for LLM consumption
UI
- Interactive Panel - Real-time task progress and agent thinking display
- Ask User Tool - Agent can ask users for clarification
- i18n Support - English and Chinese localization
Packages
| Package | Description |
|---|---|
page-agent |
Main entry with UI Panel |
@page-agent/core |
Core agent logic without UI |
@page-agent/llms |
LLM client with retry logic |
@page-agent/page-controller |
DOM operations and visual feedback |
@page-agent/ui |
Panel and i18n |
Known Limitations
- Single-page application only (cannot navigate across pages)
- No visual recognition (relies on DOM structure)
- Limited interaction support (no hover, drag-drop, canvas operations)
- See Limitations for details
Acknowledgments
This project builds upon the excellent work of browser-use. DOM processing components and prompts are adapted from browser-use (MIT License).