chore(docs): move docs

This commit is contained in:
Simon
2026-02-24 00:13:36 +08:00
parent 8cb57befb3
commit d913dd7bd0
10 changed files with 22 additions and 27 deletions

153
docs/CHANGELOG.md Normal file
View File

@@ -0,0 +1,153 @@
# Changelog
All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [1.3.0] - 2026-02-13
### Breaking Changes
- **Lifecycle: `stop()` vs `dispose()`** - New `stop()` method to cancel the current task while keeping the agent reusable. `dispose()` is now terminal — a disposed agent cannot be reused. This affects both `PageAgentCore` and `PanelAgentAdapter`.
### Features
- **Panel action button** - The panel button now morphs between Stop (■) and Close (X) based on agent status
- **Error history** - Errors and max-step failures are now recorded in `history` as `AgentErrorEvent`, making post-task analysis more complete
### Bug Fixes
- **AbortError handling** - `AbortError` is no longer retried by the LLM client, and shows a clean "Task stopped" message instead of a raw error stack
---
## [1.2.0] - 2026-02-11
### Features
- **Observe Phase** - Agent now observes the page before each action, improving decision accuracy on dynamic pages
- **Better Abort Handling** - Improved `abortSignal` support for cleaner task cancellation
### Improvements
- Pruned system prompts for lower token usage and faster responses
- Improved error handling during agent steps with better error messages
- Zod tree-shaking for smaller bundle size
### Bug Fixes
- Fixed indentation lost in DOM extraction caused by `trimLines`
- Fixed `gpt-5-mini` temperature configuration
---
## [1.1.0] - 2026-02-02
### Features
- **Custom System Prompt** - New `systemPrompt` config option to customize or extend the default system prompt
- **Chrome Extension** - Extension with multi-tab control, main-world API with token auth, and tab lifecycle management
### Improvements
- Renamed `include_attributes` to `includeAttributes` in PageController config (camelCase consistency)
- Lazy-loaded mask module for faster initialization
- Better date formatting and error messages from LLM client
- Added `rawRequest` to step history for easier debugging
### Bug Fixes
- Fixed CSP errors by using local SVGs for cursor mask instead of inline styles
- Fixed `AbortError` being incorrectly retried and shown to users
- Fixed mask not working correctly when starting a new task after stopping a previous one
---
## [1.0.0] - 2026-01-19
### 🎉 First Stable Release
PageAgent is now ready for production use. The API is stable and breaking changes will follow semantic versioning.
### Features
#### Core
- **PageAgent** - Main entry class with built-in UI Panel
- **PageAgentCore** - Headless agent class for custom UI or programmatic use
- **DOM Analysis** - Text-based DOM extraction with high-intensity dehydration
- **LLM Support** - Works with OpenAI, Claude, DeepSeek, Qwen, and other OpenAI-compatible APIs
- **Tool System** - Built-in tools for click, input, scroll, select, and more
- **Custom Tools** - Extend agent capabilities with your own tools (experimental)
- **Lifecycle Hooks** - Hook into agent execution (experimental)
- **Instructions System** - System-level and page-level instructions to guide agent behavior
- **Data Masking** - Transform page content before sending to LLM
#### Page Controller
- **Element Interactions** - Click, input text, select options, scroll
- **Visual Mask** - Blocks user interaction during automation
- **DOM Tree Extraction** - Efficient page structure extraction for LLM consumption
#### UI
- **Interactive Panel** - Real-time task progress and agent thinking display
- **Ask User Tool** - Agent can ask users for clarification
- **i18n Support** - English and Chinese localization
### Configuration
```typescript
interface PageAgentConfig {
// LLM Configuration (required)
baseURL: string
apiKey: string
model: string
temperature?: number
maxRetries?: number
customFetch?: typeof fetch
// Agent Configuration
language?: 'en-US' | 'zh-CN'
maxSteps?: number // default: 20
customTools?: Record<string, PageAgentTool> // experimental
instructions?: InstructionsConfig
transformPageContent?: (content: string) => string | Promise<string>
experimentalScriptExecutionTool?: boolean // default: false
// Lifecycle Hooks (experimental)
onBeforeTask?: (agent, result) => void
onAfterTask?: (agent, result) => void
onBeforeStep?: (agent, stepCount) => void
onAfterStep?: (agent, history) => void
onDispose?: (agent, reason?) => void
// Page Controller Configuration
enableMask?: boolean // default: true
viewportExpansion?: number
interactiveBlacklist?: Element[]
interactiveWhitelist?: Element[]
}
```
### Packages
| Package | Description |
| ----------------------------- | ---------------------------------- |
| `page-agent` | Main entry with UI Panel |
| `@page-agent/core` | Core agent logic without UI |
| `@page-agent/llms` | LLM client with retry logic |
| `@page-agent/page-controller` | DOM operations and visual feedback |
| `@page-agent/ui` | Panel and i18n |
### Known Limitations
- Single-page application only (cannot navigate across pages)
- No visual recognition (relies on DOM structure)
- Limited interaction support (no hover, drag-drop, canvas operations)
- See [Limitations](https://alibaba.github.io/page-agent/#/docs/introduction/limitations) for details
### Acknowledgments
This project builds upon the excellent work of [browser-use](https://github.com/browser-use/browser-use). DOM processing components and prompts are adapted from browser-use (MIT License).

127
docs/CODE_OF_CONDUCT.md Normal file
View File

@@ -0,0 +1,127 @@
# Alibaba Open Source Code of Conduct
[¶中文版](#我们的保证)
## Our Pledge
In the interest of fostering an open and welcoming environment, we as
contributors and maintainers pledge to making participation in our project and
our community a harassment-free experience for everyone, regardless of age, body
size, disability, ethnicity, sex characteristics, gender identity and expression,
level of experience, education, socio-economic status, nationality, personal
appearance, race, religion, or sexual identity and orientation.
## Our Standards
Examples of behavior that contributes to creating a positive environment
include:
* Using welcoming and inclusive language
* Being respectful of differing viewpoints and experiences
* Gracefully accepting constructive criticism
* Focusing on what is best for the community
* Showing empathy towards other community members
Examples of unacceptable behavior by participants include:
* The use of sexualized language or imagery and unwelcome sexual attention or
advances
* Trolling, insulting/derogatory comments, and personal or political attacks
* Public or private harassment
* Publishing others' private information, such as a physical or electronic
address, without explicit permission
* Other conduct which could reasonably be considered inappropriate in a
professional setting
## Our Responsibilities
Project maintainers are responsible for clarifying the standards of acceptable
behavior and are expected to take appropriate and fair corrective action in
response to any instances of unacceptable behavior.
Project maintainers have the right and responsibility to remove, edit, or
reject comments, commits, code, wiki edits, issues, and other contributions
that are not aligned to this Code of Conduct, or to ban temporarily or
permanently any contributor for other behaviors that they deem inappropriate,
threatening, offensive, or harmful.
## Scope
This Code of Conduct applies both within project spaces and in public spaces
when an individual is representing the project or its community. Examples of
representing a project or community include using an official project e-mail
address, posting via an official social media account, or acting as an appointed
representative at an online or offline event. Representation of a project may be
further defined and clarified by project maintainers.
## Enforcement
Instances of abusive, harassing, or otherwise unacceptable behavior may be
reported by contacting the project team at opensource@alibaba-inc.com. All
complaints will be reviewed and investigated and will result in a response that
is deemed necessary and appropriate to the circumstances. The project team is
obligated to maintain confidentiality with regard to the reporter of an incident.
Further details of specific enforcement policies may be posted separately.
Project maintainers who do not follow or enforce the Code of Conduct in good
faith may face temporary or permanent repercussions as determined by other
members of the project's leadership.
## Attribution
This Code of Conduct is adapted from the [Contributor Covenant](https://www.contributor-covenant.org), version 1.4,
available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html
---
> Chinese Version
> 《阿里巴巴开源行为准则》
## 我们的保证
为了促进一个开放透明且友好的环境,我们作为贡献者和维护者保证:无论年龄、种族、民族、性别认同和表达(方式)、体型、身体健全与否、经验水平、国籍、个人表现、宗教或性别取向,参与者在我们项目和社区中都免于骚扰。
## 我们的标准
有助于创造正面环境的行为包括但不限于:
* 使用友好和包容性语言
* 尊重不同的观点和经历
* 耐心地接受建设性批评
* 关注对社区最有利的事情
* 友善对待其他社区成员
身为参与者不能接受的行为包括但不限于:
* 使用与性有关的言语或是图像,以及不受欢迎的性骚扰
* 捣乱/煽动/造谣的行为或进行侮辱/贬损的评论,人身攻击及政治攻击
* 公开或私下的骚扰
* 未经许可地发布他人的个人资料,例如住址或是电子地址
* 其他可以被合理地认定为不恰当或者违反职业操守的行为
## 我们的责任
项目维护者有责任为「可接受的行为」标准做出诠释,以及对已发生的不被接受的行为采取恰当且公平的纠正措施。
项目维护者有权利及责任去删除、编辑、拒绝与本行为标准有所违背的评论 (comments)、提交 (commits)、代码、wiki 编辑、问题 (issues) 和其他贡献,以及项目维护者可暂时或永久性的禁止任何他们认为有不适当、威胁、冒犯、有害行为的贡献者。
## 使用范围
当一个人代表该项目或是其社区时,本行为标准适用于其项目平台和公共平台。
代表项目或是社区的情况,举例来说包括使用官方项目的电子邮件地址、通过官方的社区媒体账号发布或线上或线下事件中担任指定代表。
该项目的呈现方式可由其项目维护者进行进一步的定义及解释。
## 强制执行
可以通过 opensource@alibaba-inc.com 来联系项目团队来举报滥用、骚扰或其他不被接受的行为。
任何维护团队认为有必要且适合的所有投诉都将进行审查及调查,并做出相对应的回应。项目小组有对事件回报者有保密的义务。具体执行的方针近一步细节可能会单独公布。
没有切实地遵守或是执行本行为标准的项目维护人员,可能会因项目领导人或是其他成员的决定,暂时或是永久地取消其参与资格。
## 来源
本行为标准改编自[贡献者公约](https://www.contributor-covenant.org),版本 1.4
可在此查看[https://www.contributor-covenant.org/zh-cn/version/1/4/code-of-conduct.html](https://www.contributor-covenant.org/zh-cn/version/1/4/code-of-conduct.html)

113
docs/README-zh.md Normal file
View File

@@ -0,0 +1,113 @@
# Page Agent
<picture>
<source media="(prefers-color-scheme: dark)" srcset="https://img.alicdn.com/imgextra/i4/O1CN01qKig1P1FnhpFKNdi6_!!6000000000532-2-tps-1280-256.png">
<img alt="Page Agent Banner" src="https://img.alicdn.com/imgextra/i1/O1CN01NCMKXj1Gn4tkFTsxf_!!6000000000666-2-tps-1280-256.png">
</picture>
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) [![TypeScript](https://img.shields.io/badge/%3C%2F%3E-TypeScript-%230074c1.svg)](http://www.typescriptlang.org/) [![Downloads](https://img.shields.io/npm/dt/page-agent.svg)](https://www.npmjs.com/package/page-agent) [![Bundle Size](https://img.shields.io/bundlephobia/minzip/page-agent)](https://bundlephobia.com/package/page-agent) [![GitHub stars](https://img.shields.io/github/stars/alibaba/page-agent.svg)](https://github.com/alibaba/page-agent)
纯 JS 实现的 GUI agent。使用自然语言操作你的 Web 应用。无须后端、客户端、浏览器插件。
🌐 [English](../README.md) | **中文**
👉 <a href="https://alibaba.github.io/page-agent/" target="_blank"><b>🚀 Demo</b></a> | <a href="https://alibaba.github.io/page-agent/#/docs/introduction/overview" target="_blank"><b>📖 Documentation</b></a>
<video id="demo-video" src="https://github.com/user-attachments/assets/34d8444d-cbfb-44a3-a24e-fd5c167bb0bf" controls crossorigin muted></video>
---
## ✨ Features
- **🎯 轻松集成**
- 无需 `浏览器插件` / `Python` / `无头浏览器`
- 纯页面内 JavaScript一切都在你的网页中完成。
- The best tool for your agent to control web pages.
- **📖 基于文本的 DOM 操作**
- 无需截图,无需 OCR 或多模态模型。
- 无需特殊权限。
- **🧠 用你自己的 LLM**
- **🎨 精美 UI支持人机协同**
- **🐙 可选的 [Chrome 扩展](https://alibaba.github.io/page-agent/#/docs/features/chrome-extension),支持跨页面任务。**
## 💡 应用场景
- **SaaS AI 副驾驶** — 几行代码为你的产品加上 AI 副驾驶,不需要重写后端。
- **智能表单填写** — 把 20 次点击变成一句话。ERP、CRM、管理后台的最佳拍档。
- **无障碍增强** — 用自然语言让任何网页无障碍。语音指令、屏幕阅读器,零门槛。
- **跨页面 Agent** — 通过可选的 [Chrome 扩展](https://alibaba.github.io/page-agent/#/docs/features/chrome-extension),让你的 Agent 跨标签页工作。
## 🚀 快速开始
### 一行代码集成
通过我们免费的 Demo LLM 快速体验 PageAgent
```html
<script src="{URL}" crossorigin="true"></script>
```
| Mirrors | URL |
| ------- | ---------------------------------------------------------------------------------- |
| Global | https://cdn.jsdelivr.net/npm/page-agent@1.3.0/dist/iife/page-agent.demo.js |
| China | https://registry.npmmirror.com/page-agent/1.3.0/files/dist/iife/page-agent.demo.js |
> **⚠️ 仅用于技术评估。** Demo LLM 有速率和使用限制,速度较慢,可能随时变更。
### NPM 安装
```bash
npm install page-agent
```
```javascript
import { PageAgent } from 'page-agent'
const agent = new PageAgent({
model: 'deepseek-chat',
baseURL: 'https://api.deepseek.com',
apiKey: 'YOUR_API_KEY',
language: 'zh-CN',
})
await agent.execute('点击登录按钮')
```
更多编程用法,请参阅 [📖 文档](https://alibaba.github.io/page-agent/#/docs/introduction/overview)。
## 🤝 贡献
欢迎社区贡献!请参阅 [CONTRIBUTING.md](../CONTRIBUTING.md) 了解环境配置和本地开发说明。
请在贡献前阅读[行为准则](CODE_OF_CONDUCT.md)。
## 👏 致谢
本项目基于 **[`browser-use`](https://github.com/browser-use/browser-use)** 的优秀工作构建。
`PageAgent` 专为**客户端网页增强**设计,不是服务端自动化工具。
```
DOM processing components and prompt are derived from browser-use:
Browser Use
Copyright (c) 2024 Gregor Zunic
Licensed under the MIT License
Original browser-use project: <https://github.com/browser-use/browser-use>
We gratefully acknowledge the browser-use project and its contributors for their
excellent work on web automation and DOM interaction patterns that helped make
this project possible.
Third-party dependencies and their licenses can be found in the package.json
file and in the node_modules directory after installation.
```
## 📄 许可证
[MIT License](../LICENSE)
---
**⭐ 如果觉得 PageAgent 有用或有趣,请给项目点个星!**