Files

Simon 09bdf9ddaf feat(core): experimental support for llms.txt (#157 )

* feat(core): experimental support for llms.txt

* docs: experimentalLlmsTxt

2026-03-04 18:53:24 +08:00

7.2 KiB

Raw Blame History

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[1.4.0] - 2026-02-27

Features

Update Terms of Use and Privacy Policy
Robust tool-call validation - Action inputs are now validated against tool schemas individually, producing clear error messages (e.g. Invalid input for action "click_element_by_index") instead of unreadable union parse errors
Primitive action input coercion - Small models that output {"click_element_by_index": 2} instead of {"click_element_by_index": {"index": 2}} are now auto-corrected using tool schemas
Qwen model updates - Added qwen3.5-plus as the default free testing model; disabled enable_thinking for Qwen models to avoid incompatible responses
Updated default LLM endpoint - Migrated demo and extension to a new testing endpoint with legacy endpoint auto-migration

Improvements

Unified zod imports (* as z) across all packages for consistency
Better Zod error formatting with z.prettifyError() in LLM client
Exported InvokeError and InvokeErrorType as values (not just types) from @page-agent/llms
Exported SupportedLanguage type from @page-agent/core

Extension v0.1.8

Language setting - Added language selector (System / English / 中文) in config panel
UI makeover - New empty state with breathing glow and typing animation; ai-motion glow overlay while running; refined focus styles
Testing endpoint notice - Shows terms of use notice when using the free testing API
Legacy endpoint migration - Auto-migrates old Supabase testing endpoint to new endpoint on startup

[1.3.0] - 2026-02-13

Breaking Changes

Lifecycle: stop() vs dispose() - New stop() method to cancel the current task while keeping the agent reusable. dispose() is now terminal — a disposed agent cannot be reused. This affects both PageAgentCore and PanelAgentAdapter.

Features

Panel action button - The panel button now morphs between Stop (■) and Close (X) based on agent status
Error history - Errors and max-step failures are now recorded in history as AgentErrorEvent, making post-task analysis more complete

Bug Fixes

AbortError handling - AbortError is no longer retried by the LLM client, and shows a clean "Task stopped" message instead of a raw error stack

[1.2.0] - 2026-02-11

Features

Observe Phase - Agent now observes the page before each action, improving decision accuracy on dynamic pages
Better Abort Handling - Improved abortSignal support for cleaner task cancellation

Improvements

Pruned system prompts for lower token usage and faster responses
Improved error handling during agent steps with better error messages
Zod tree-shaking for smaller bundle size

Bug Fixes

Fixed indentation lost in DOM extraction caused by trimLines
Fixed gpt-5-mini temperature configuration

[1.1.0] - 2026-02-02

Features

Custom System Prompt - New systemPrompt config option to customize or extend the default system prompt
Chrome Extension - Extension with multi-tab control, main-world API with token auth, and tab lifecycle management

Improvements

Renamed include_attributes to includeAttributes in PageController config (camelCase consistency)
Lazy-loaded mask module for faster initialization
Better date formatting and error messages from LLM client
Added rawRequest to step history for easier debugging

Bug Fixes

Fixed CSP errors by using local SVGs for cursor mask instead of inline styles
Fixed AbortError being incorrectly retried and shown to users
Fixed mask not working correctly when starting a new task after stopping a previous one

[1.0.0] - 2026-01-19

🎉 First Stable Release

PageAgent is now ready for production use. The API is stable and breaking changes will follow semantic versioning.

Features

Core

PageAgent - Main entry class with built-in UI Panel
PageAgentCore - Headless agent class for custom UI or programmatic use
DOM Analysis - Text-based DOM extraction with high-intensity dehydration
LLM Support - Works with OpenAI, Claude, DeepSeek, Qwen, and other OpenAI-compatible APIs
Tool System - Built-in tools for click, input, scroll, select, and more
Custom Tools - Extend agent capabilities with your own tools (experimental)
Lifecycle Hooks - Hook into agent execution (experimental)
Instructions System - System-level and page-level instructions to guide agent behavior
Data Masking - Transform page content before sending to LLM

Page Controller

Element Interactions - Click, input text, select options, scroll
Visual Mask - Blocks user interaction during automation
DOM Tree Extraction - Efficient page structure extraction for LLM consumption

UI

Interactive Panel - Real-time task progress and agent thinking display
Ask User Tool - Agent can ask users for clarification
i18n Support - English and Chinese localization

Configuration

// Version 1.0.0
interface PageAgentConfig {
    // LLM Configuration (required)
    baseURL: string
    apiKey: string
    model: string
    temperature?: number
    maxRetries?: number
    customFetch?: typeof fetch

    // Agent Configuration
    language?: 'en-US' | 'zh-CN'
    maxSteps?: number // default: 20
    customTools?: Record<string, PageAgentTool> // experimental
    instructions?: InstructionsConfig
    transformPageContent?: (content: string) => string | Promise<string>
    experimentalScriptExecutionTool?: boolean // default: false

    // Lifecycle Hooks (experimental)
    onBeforeTask?: (agent, result) => void
    onAfterTask?: (agent, result) => void
    onBeforeStep?: (agent, stepCount) => void
    onAfterStep?: (agent, history) => void
    onDispose?: (agent, reason?) => void

    // Page Controller Configuration
    enableMask?: boolean // default: true
    viewportExpansion?: number
    interactiveBlacklist?: Element[]
    interactiveWhitelist?: Element[]
}

Packages

Package	Description
`page-agent`	Main entry with UI Panel
`@page-agent/core`	Core agent logic without UI
`@page-agent/llms`	LLM client with retry logic
`@page-agent/page-controller`	DOM operations and visual feedback
`@page-agent/ui`	Panel and i18n

Known Limitations

Single-page application only (cannot navigate across pages)
No visual recognition (relies on DOM structure)
Limited interaction support (no hover, drag-drop, canvas operations)
See Limitations for details

Acknowledgments

This project builds upon the excellent work of browser-use. DOM processing components and prompts are adapted from browser-use (MIT License).

7.2 KiB Raw Blame History

Changelog

[1.4.0] - 2026-02-27

Features

Improvements

Extension v0.1.8

[1.3.0] - 2026-02-13

Breaking Changes

Features

Bug Fixes

[1.2.0] - 2026-02-11

Features

Improvements

Bug Fixes

[1.1.0] - 2026-02-02

Features

Improvements

Bug Fixes

[1.0.0] - 2026-01-19

🎉 First Stable Release

Features

Core

Page Controller

UI

Configuration

Packages

Known Limitations

Acknowledgments

7.2 KiB

Raw Blame History