chore(docs): move docs
This commit is contained in:
153
docs/CHANGELOG.md
Normal file
153
docs/CHANGELOG.md
Normal file
@@ -0,0 +1,153 @@
|
||||
# Changelog
|
||||
|
||||
All notable changes to this project will be documented in this file.
|
||||
|
||||
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
|
||||
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
||||
|
||||
## [1.3.0] - 2026-02-13
|
||||
|
||||
### Breaking Changes
|
||||
|
||||
- **Lifecycle: `stop()` vs `dispose()`** - New `stop()` method to cancel the current task while keeping the agent reusable. `dispose()` is now terminal — a disposed agent cannot be reused. This affects both `PageAgentCore` and `PanelAgentAdapter`.
|
||||
|
||||
### Features
|
||||
|
||||
- **Panel action button** - The panel button now morphs between Stop (■) and Close (X) based on agent status
|
||||
- **Error history** - Errors and max-step failures are now recorded in `history` as `AgentErrorEvent`, making post-task analysis more complete
|
||||
|
||||
### Bug Fixes
|
||||
|
||||
- **AbortError handling** - `AbortError` is no longer retried by the LLM client, and shows a clean "Task stopped" message instead of a raw error stack
|
||||
|
||||
---
|
||||
|
||||
## [1.2.0] - 2026-02-11
|
||||
|
||||
### Features
|
||||
|
||||
- **Observe Phase** - Agent now observes the page before each action, improving decision accuracy on dynamic pages
|
||||
- **Better Abort Handling** - Improved `abortSignal` support for cleaner task cancellation
|
||||
|
||||
### Improvements
|
||||
|
||||
- Pruned system prompts for lower token usage and faster responses
|
||||
- Improved error handling during agent steps with better error messages
|
||||
- Zod tree-shaking for smaller bundle size
|
||||
|
||||
### Bug Fixes
|
||||
|
||||
- Fixed indentation lost in DOM extraction caused by `trimLines`
|
||||
- Fixed `gpt-5-mini` temperature configuration
|
||||
|
||||
---
|
||||
|
||||
## [1.1.0] - 2026-02-02
|
||||
|
||||
### Features
|
||||
|
||||
- **Custom System Prompt** - New `systemPrompt` config option to customize or extend the default system prompt
|
||||
- **Chrome Extension** - Extension with multi-tab control, main-world API with token auth, and tab lifecycle management
|
||||
|
||||
### Improvements
|
||||
|
||||
- Renamed `include_attributes` to `includeAttributes` in PageController config (camelCase consistency)
|
||||
- Lazy-loaded mask module for faster initialization
|
||||
- Better date formatting and error messages from LLM client
|
||||
- Added `rawRequest` to step history for easier debugging
|
||||
|
||||
### Bug Fixes
|
||||
|
||||
- Fixed CSP errors by using local SVGs for cursor mask instead of inline styles
|
||||
- Fixed `AbortError` being incorrectly retried and shown to users
|
||||
- Fixed mask not working correctly when starting a new task after stopping a previous one
|
||||
|
||||
---
|
||||
|
||||
## [1.0.0] - 2026-01-19
|
||||
|
||||
### 🎉 First Stable Release
|
||||
|
||||
PageAgent is now ready for production use. The API is stable and breaking changes will follow semantic versioning.
|
||||
|
||||
### Features
|
||||
|
||||
#### Core
|
||||
|
||||
- **PageAgent** - Main entry class with built-in UI Panel
|
||||
- **PageAgentCore** - Headless agent class for custom UI or programmatic use
|
||||
- **DOM Analysis** - Text-based DOM extraction with high-intensity dehydration
|
||||
- **LLM Support** - Works with OpenAI, Claude, DeepSeek, Qwen, and other OpenAI-compatible APIs
|
||||
- **Tool System** - Built-in tools for click, input, scroll, select, and more
|
||||
- **Custom Tools** - Extend agent capabilities with your own tools (experimental)
|
||||
- **Lifecycle Hooks** - Hook into agent execution (experimental)
|
||||
- **Instructions System** - System-level and page-level instructions to guide agent behavior
|
||||
- **Data Masking** - Transform page content before sending to LLM
|
||||
|
||||
#### Page Controller
|
||||
|
||||
- **Element Interactions** - Click, input text, select options, scroll
|
||||
- **Visual Mask** - Blocks user interaction during automation
|
||||
- **DOM Tree Extraction** - Efficient page structure extraction for LLM consumption
|
||||
|
||||
#### UI
|
||||
|
||||
- **Interactive Panel** - Real-time task progress and agent thinking display
|
||||
- **Ask User Tool** - Agent can ask users for clarification
|
||||
- **i18n Support** - English and Chinese localization
|
||||
|
||||
### Configuration
|
||||
|
||||
```typescript
|
||||
interface PageAgentConfig {
|
||||
// LLM Configuration (required)
|
||||
baseURL: string
|
||||
apiKey: string
|
||||
model: string
|
||||
temperature?: number
|
||||
maxRetries?: number
|
||||
customFetch?: typeof fetch
|
||||
|
||||
// Agent Configuration
|
||||
language?: 'en-US' | 'zh-CN'
|
||||
maxSteps?: number // default: 20
|
||||
customTools?: Record<string, PageAgentTool> // experimental
|
||||
instructions?: InstructionsConfig
|
||||
transformPageContent?: (content: string) => string | Promise<string>
|
||||
experimentalScriptExecutionTool?: boolean // default: false
|
||||
|
||||
// Lifecycle Hooks (experimental)
|
||||
onBeforeTask?: (agent, result) => void
|
||||
onAfterTask?: (agent, result) => void
|
||||
onBeforeStep?: (agent, stepCount) => void
|
||||
onAfterStep?: (agent, history) => void
|
||||
onDispose?: (agent, reason?) => void
|
||||
|
||||
// Page Controller Configuration
|
||||
enableMask?: boolean // default: true
|
||||
viewportExpansion?: number
|
||||
interactiveBlacklist?: Element[]
|
||||
interactiveWhitelist?: Element[]
|
||||
}
|
||||
```
|
||||
|
||||
### Packages
|
||||
|
||||
| Package | Description |
|
||||
| ----------------------------- | ---------------------------------- |
|
||||
| `page-agent` | Main entry with UI Panel |
|
||||
| `@page-agent/core` | Core agent logic without UI |
|
||||
| `@page-agent/llms` | LLM client with retry logic |
|
||||
| `@page-agent/page-controller` | DOM operations and visual feedback |
|
||||
| `@page-agent/ui` | Panel and i18n |
|
||||
|
||||
### Known Limitations
|
||||
|
||||
- Single-page application only (cannot navigate across pages)
|
||||
- No visual recognition (relies on DOM structure)
|
||||
- Limited interaction support (no hover, drag-drop, canvas operations)
|
||||
- See [Limitations](https://alibaba.github.io/page-agent/#/docs/introduction/limitations) for details
|
||||
|
||||
### Acknowledgments
|
||||
|
||||
This project builds upon the excellent work of [browser-use](https://github.com/browser-use/browser-use). DOM processing components and prompts are adapted from browser-use (MIT License).
|
||||
127
docs/CODE_OF_CONDUCT.md
Normal file
127
docs/CODE_OF_CONDUCT.md
Normal file
@@ -0,0 +1,127 @@
|
||||
# Alibaba Open Source Code of Conduct
|
||||
|
||||
[¶中文版](#我们的保证)
|
||||
|
||||
## Our Pledge
|
||||
|
||||
In the interest of fostering an open and welcoming environment, we as
|
||||
contributors and maintainers pledge to making participation in our project and
|
||||
our community a harassment-free experience for everyone, regardless of age, body
|
||||
size, disability, ethnicity, sex characteristics, gender identity and expression,
|
||||
level of experience, education, socio-economic status, nationality, personal
|
||||
appearance, race, religion, or sexual identity and orientation.
|
||||
|
||||
## Our Standards
|
||||
|
||||
Examples of behavior that contributes to creating a positive environment
|
||||
include:
|
||||
|
||||
* Using welcoming and inclusive language
|
||||
* Being respectful of differing viewpoints and experiences
|
||||
* Gracefully accepting constructive criticism
|
||||
* Focusing on what is best for the community
|
||||
* Showing empathy towards other community members
|
||||
|
||||
Examples of unacceptable behavior by participants include:
|
||||
|
||||
* The use of sexualized language or imagery and unwelcome sexual attention or
|
||||
advances
|
||||
* Trolling, insulting/derogatory comments, and personal or political attacks
|
||||
* Public or private harassment
|
||||
* Publishing others' private information, such as a physical or electronic
|
||||
address, without explicit permission
|
||||
* Other conduct which could reasonably be considered inappropriate in a
|
||||
professional setting
|
||||
|
||||
## Our Responsibilities
|
||||
|
||||
Project maintainers are responsible for clarifying the standards of acceptable
|
||||
behavior and are expected to take appropriate and fair corrective action in
|
||||
response to any instances of unacceptable behavior.
|
||||
|
||||
Project maintainers have the right and responsibility to remove, edit, or
|
||||
reject comments, commits, code, wiki edits, issues, and other contributions
|
||||
that are not aligned to this Code of Conduct, or to ban temporarily or
|
||||
permanently any contributor for other behaviors that they deem inappropriate,
|
||||
threatening, offensive, or harmful.
|
||||
|
||||
## Scope
|
||||
|
||||
This Code of Conduct applies both within project spaces and in public spaces
|
||||
when an individual is representing the project or its community. Examples of
|
||||
representing a project or community include using an official project e-mail
|
||||
address, posting via an official social media account, or acting as an appointed
|
||||
representative at an online or offline event. Representation of a project may be
|
||||
further defined and clarified by project maintainers.
|
||||
|
||||
## Enforcement
|
||||
|
||||
Instances of abusive, harassing, or otherwise unacceptable behavior may be
|
||||
reported by contacting the project team at opensource@alibaba-inc.com. All
|
||||
complaints will be reviewed and investigated and will result in a response that
|
||||
is deemed necessary and appropriate to the circumstances. The project team is
|
||||
obligated to maintain confidentiality with regard to the reporter of an incident.
|
||||
Further details of specific enforcement policies may be posted separately.
|
||||
|
||||
Project maintainers who do not follow or enforce the Code of Conduct in good
|
||||
faith may face temporary or permanent repercussions as determined by other
|
||||
members of the project's leadership.
|
||||
|
||||
## Attribution
|
||||
|
||||
This Code of Conduct is adapted from the [Contributor Covenant](https://www.contributor-covenant.org), version 1.4,
|
||||
available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html
|
||||
|
||||
---
|
||||
|
||||
> Chinese Version
|
||||
> 《阿里巴巴开源行为准则》
|
||||
|
||||
## 我们的保证
|
||||
|
||||
为了促进一个开放透明且友好的环境,我们作为贡献者和维护者保证:无论年龄、种族、民族、性别认同和表达(方式)、体型、身体健全与否、经验水平、国籍、个人表现、宗教或性别取向,参与者在我们项目和社区中都免于骚扰。
|
||||
|
||||
## 我们的标准
|
||||
|
||||
有助于创造正面环境的行为包括但不限于:
|
||||
|
||||
* 使用友好和包容性语言
|
||||
* 尊重不同的观点和经历
|
||||
* 耐心地接受建设性批评
|
||||
* 关注对社区最有利的事情
|
||||
* 友善对待其他社区成员
|
||||
|
||||
身为参与者不能接受的行为包括但不限于:
|
||||
|
||||
* 使用与性有关的言语或是图像,以及不受欢迎的性骚扰
|
||||
* 捣乱/煽动/造谣的行为或进行侮辱/贬损的评论,人身攻击及政治攻击
|
||||
* 公开或私下的骚扰
|
||||
* 未经许可地发布他人的个人资料,例如住址或是电子地址
|
||||
* 其他可以被合理地认定为不恰当或者违反职业操守的行为
|
||||
|
||||
## 我们的责任
|
||||
|
||||
项目维护者有责任为「可接受的行为」标准做出诠释,以及对已发生的不被接受的行为采取恰当且公平的纠正措施。
|
||||
|
||||
项目维护者有权利及责任去删除、编辑、拒绝与本行为标准有所违背的评论 (comments)、提交 (commits)、代码、wiki 编辑、问题 (issues) 和其他贡献,以及项目维护者可暂时或永久性的禁止任何他们认为有不适当、威胁、冒犯、有害行为的贡献者。
|
||||
|
||||
## 使用范围
|
||||
|
||||
当一个人代表该项目或是其社区时,本行为标准适用于其项目平台和公共平台。
|
||||
|
||||
代表项目或是社区的情况,举例来说包括使用官方项目的电子邮件地址、通过官方的社区媒体账号发布或线上或线下事件中担任指定代表。
|
||||
|
||||
该项目的呈现方式可由其项目维护者进行进一步的定义及解释。
|
||||
|
||||
## 强制执行
|
||||
|
||||
可以通过 opensource@alibaba-inc.com 来联系项目团队来举报滥用、骚扰或其他不被接受的行为。
|
||||
|
||||
任何维护团队认为有必要且适合的所有投诉都将进行审查及调查,并做出相对应的回应。项目小组有对事件回报者有保密的义务。具体执行的方针近一步细节可能会单独公布。
|
||||
|
||||
没有切实地遵守或是执行本行为标准的项目维护人员,可能会因项目领导人或是其他成员的决定,暂时或是永久地取消其参与资格。
|
||||
|
||||
## 来源
|
||||
|
||||
本行为标准改编自[贡献者公约](https://www.contributor-covenant.org),版本 1.4
|
||||
可在此查看[https://www.contributor-covenant.org/zh-cn/version/1/4/code-of-conduct.html](https://www.contributor-covenant.org/zh-cn/version/1/4/code-of-conduct.html)
|
||||
113
docs/README-zh.md
Normal file
113
docs/README-zh.md
Normal file
@@ -0,0 +1,113 @@
|
||||
# Page Agent
|
||||
|
||||
<picture>
|
||||
<source media="(prefers-color-scheme: dark)" srcset="https://img.alicdn.com/imgextra/i4/O1CN01qKig1P1FnhpFKNdi6_!!6000000000532-2-tps-1280-256.png">
|
||||
<img alt="Page Agent Banner" src="https://img.alicdn.com/imgextra/i1/O1CN01NCMKXj1Gn4tkFTsxf_!!6000000000666-2-tps-1280-256.png">
|
||||
</picture>
|
||||
|
||||
[](https://opensource.org/licenses/MIT) [](http://www.typescriptlang.org/) [](https://www.npmjs.com/package/page-agent) [](https://bundlephobia.com/package/page-agent) [](https://github.com/alibaba/page-agent)
|
||||
|
||||
纯 JS 实现的 GUI agent。使用自然语言操作你的 Web 应用。无须后端、客户端、浏览器插件。
|
||||
|
||||
🌐 [English](../README.md) | **中文**
|
||||
|
||||
👉 <a href="https://alibaba.github.io/page-agent/" target="_blank"><b>🚀 Demo</b></a> | <a href="https://alibaba.github.io/page-agent/#/docs/introduction/overview" target="_blank"><b>📖 Documentation</b></a>
|
||||
|
||||
<video id="demo-video" src="https://github.com/user-attachments/assets/34d8444d-cbfb-44a3-a24e-fd5c167bb0bf" controls crossorigin muted></video>
|
||||
|
||||
---
|
||||
|
||||
## ✨ Features
|
||||
|
||||
- **🎯 轻松集成**
|
||||
- 无需 `浏览器插件` / `Python` / `无头浏览器`。
|
||||
- 纯页面内 JavaScript,一切都在你的网页中完成。
|
||||
- The best tool for your agent to control web pages.
|
||||
- **📖 基于文本的 DOM 操作**
|
||||
- 无需截图,无需 OCR 或多模态模型。
|
||||
- 无需特殊权限。
|
||||
- **🧠 用你自己的 LLM**
|
||||
- **🎨 精美 UI,支持人机协同**
|
||||
- **🐙 可选的 [Chrome 扩展](https://alibaba.github.io/page-agent/#/docs/features/chrome-extension),支持跨页面任务。**
|
||||
|
||||
## 💡 应用场景
|
||||
|
||||
- **SaaS AI 副驾驶** — 几行代码为你的产品加上 AI 副驾驶,不需要重写后端。
|
||||
- **智能表单填写** — 把 20 次点击变成一句话。ERP、CRM、管理后台的最佳拍档。
|
||||
- **无障碍增强** — 用自然语言让任何网页无障碍。语音指令、屏幕阅读器,零门槛。
|
||||
- **跨页面 Agent** — 通过可选的 [Chrome 扩展](https://alibaba.github.io/page-agent/#/docs/features/chrome-extension),让你的 Agent 跨标签页工作。
|
||||
|
||||
## 🚀 快速开始
|
||||
|
||||
### 一行代码集成
|
||||
|
||||
通过我们免费的 Demo LLM 快速体验 PageAgent:
|
||||
|
||||
```html
|
||||
<script src="{URL}" crossorigin="true"></script>
|
||||
```
|
||||
|
||||
| Mirrors | URL |
|
||||
| ------- | ---------------------------------------------------------------------------------- |
|
||||
| Global | https://cdn.jsdelivr.net/npm/page-agent@1.3.0/dist/iife/page-agent.demo.js |
|
||||
| China | https://registry.npmmirror.com/page-agent/1.3.0/files/dist/iife/page-agent.demo.js |
|
||||
|
||||
> **⚠️ 仅用于技术评估。** Demo LLM 有速率和使用限制,速度较慢,可能随时变更。
|
||||
|
||||
### NPM 安装
|
||||
|
||||
```bash
|
||||
npm install page-agent
|
||||
```
|
||||
|
||||
```javascript
|
||||
import { PageAgent } from 'page-agent'
|
||||
|
||||
const agent = new PageAgent({
|
||||
model: 'deepseek-chat',
|
||||
baseURL: 'https://api.deepseek.com',
|
||||
apiKey: 'YOUR_API_KEY',
|
||||
language: 'zh-CN',
|
||||
})
|
||||
|
||||
await agent.execute('点击登录按钮')
|
||||
```
|
||||
|
||||
更多编程用法,请参阅 [📖 文档](https://alibaba.github.io/page-agent/#/docs/introduction/overview)。
|
||||
|
||||
## 🤝 贡献
|
||||
|
||||
欢迎社区贡献!请参阅 [CONTRIBUTING.md](../CONTRIBUTING.md) 了解环境配置和本地开发说明。
|
||||
|
||||
请在贡献前阅读[行为准则](CODE_OF_CONDUCT.md)。
|
||||
|
||||
## 👏 致谢
|
||||
|
||||
本项目基于 **[`browser-use`](https://github.com/browser-use/browser-use)** 的优秀工作构建。
|
||||
|
||||
`PageAgent` 专为**客户端网页增强**设计,不是服务端自动化工具。
|
||||
|
||||
```
|
||||
DOM processing components and prompt are derived from browser-use:
|
||||
|
||||
Browser Use
|
||||
Copyright (c) 2024 Gregor Zunic
|
||||
Licensed under the MIT License
|
||||
|
||||
Original browser-use project: <https://github.com/browser-use/browser-use>
|
||||
|
||||
We gratefully acknowledge the browser-use project and its contributors for their
|
||||
excellent work on web automation and DOM interaction patterns that helped make
|
||||
this project possible.
|
||||
|
||||
Third-party dependencies and their licenses can be found in the package.json
|
||||
file and in the node_modules directory after installation.
|
||||
```
|
||||
|
||||
## 📄 许可证
|
||||
|
||||
[MIT License](../LICENSE)
|
||||
|
||||
---
|
||||
|
||||
**⭐ 如果觉得 PageAgent 有用或有趣,请给项目点个星!**
|
||||
Reference in New Issue
Block a user