Files
page-agent/AGENTS.md
2026-01-13 20:52:41 +08:00

132 lines
5.3 KiB
Markdown

# Instructions for Coding Assistants
## Project Overview
This is a **monorepo** with npm workspaces:
- **Core Library** (`packages/page-agent/`) - AI agent for browser DOM automation, published as `page-agent` on npm
- **Website** (`packages/website/`) - React docs and landing page. **When working on website, follow `packages/website/AGENTS.md`**
Internal packages:
- **CDN** (`packages/cdn/`) - IIFE builds for script tag usage (npm: `@page-agent/cdn`)
- **LLMs** (`packages/llms/`) - LLM client with reflection-before-action mental model
- **Page Controller** (`packages/page-controller/`) - DOM operations and visual feedback (SimulatorMask), independent of LLM
- **UI** (`packages/ui/`) - Panel and i18n. Decoupled from PageAgent
## Development Commands
```bash
npm start # Start website dev server
npm run build # Build all packages
npm run build:libs # Build all libraries
npm run lint # ESLint with TypeScript strict rules
```
## Architecture
### Monorepo Structure
Simple monorepo solution: TypeScript references + Vite aliases. Update tsconfig and vite config when adding/removing packages.
```
packages/
├── page-agent/ # npm: "page-agent" ⭐ MAIN
├── cdn/ # npm: "@page-agent/cdn" (IIFE builds)
├── website/ # @page-agent/website (private)
├── llms/ # @page-agent/llms
├── page-controller/ # @page-agent/page-controller
└── ui/ # @page-agent/ui
```
`workspaces` in `package.json` must be in topological order.
### Module Boundaries
- **Page Agent**: Core lib. Imports from `@page-agent/llms`, `@page-agent/page-controller`, `@page-agent/ui`
- **LLMs**: LLM client with MacroToolInput contract. No dependency on page-agent
- **UI**: Panel and i18n. No dependency on page-agent
- **Page Controller**: DOM operations with optional visual feedback (SimulatorMask). No LLM dependency. Enable mask via `enableMask: true` config
### PageController ↔ PageAgent Communication
All communication is async and isolated:
```typescript
// PageAgent delegates DOM operations to PageController
await this.pageController.updateTree()
await this.pageController.clickElement(index)
await this.pageController.inputText(index, text)
await this.pageController.scroll({ down: true, numPages: 1 })
// PageController exposes state via async methods
const simplifiedHTML = await this.pageController.getSimplifiedHTML()
const pageInfo = await this.pageController.getPageInfo()
```
### DOM Pipeline
1. **DOM Extraction**: Live DOM → `FlatDomTree` via `page-controller/src/dom/dom_tree/`
2. **Dehydration**: DOM tree → simplified text for LLM
3. **LLM Processing**: AI returns action plans (page-agent)
4. **Indexed Operations**: PageAgent calls PageController by element index
### CDN Builds (`packages/cdn/`)
Two IIFE builds for script tag usage:
| Build | File | Description |
| ----- | -------------------- | -------------------------------- |
| Demo | `page-agent.demo.js` | Auto-init with built-in test API |
| Full | `page-agent.js` | Exposes `PageAgent` class only |
Demo build supports query params (e.g., `?model=gpt-4&lang=en-US`).
## Key Files Reference
### Page Agent (`packages/page-agent/`)
| File | Description |
| ------------------ | --------------------------------------- |
| `src/PageAgent.ts` | ⭐ Main AI agent class |
| `src/umd.ts` | CDN/UMD entry with auto-init |
| `src/tools/` | Tool definitions calling PageController |
### LLMs (`packages/llms/`)
| File | Description |
| ---------------------------- | ------------------------------------- |
| `src/index.ts` | ⭐ LLM class with retry logic |
| `src/types.ts` | MacroToolInput, AgentBrain, LLMConfig |
| `src/OpenAILenientClient.ts` | OpenAI-compatible client |
### Page Controller (`packages/page-controller/`)
| File | Description |
| --------------------------- | ---------------------------------------------------------- |
| `src/PageController.ts` | ⭐ Main controller class with optional mask support |
| `src/SimulatorMask.ts` | Visual overlay blocking user interaction during automation |
| `src/actions.ts` | Element interactions (click, input, scroll) |
| `src/dom/dom_tree/index.js` | Core DOM extraction engine |
## Adding New Features
### New Agent Tool
1. Implement in `packages/page-agent/src/tools/index.ts`
2. If tool needs DOM ops, add method to PageController first
3. Tool calls `this.pageController.methodName()` for DOM interactions
### New PageController Action
1. Add implementation in `packages/page-controller/src/actions.ts`
2. Expose via async method in `PageController.ts`
3. Export from `packages/page-controller/src/index.ts`
## Code Standards
- Explicit typing for exported/public APIs
- ESLint relaxes some unsafe rules for rapid iteration
- Every change you make should not only implement the desired functionality but also improve the quality of the codebase
- All code and comments must be in English.