8.1 KiB
Instructions for coding assistants
Project Overview
This is a monorepo with npm workspaces containing two main packages:
- Core Library (
packages/page-agent/) - Pure JavaScript/TypeScript AI agent library for browser DOM automation, published aspage-agenton npm - Website (
packages/website/) - React documentation and landing page. Also as demo and test page for the core lib. private package@page-agent/website
And other internal packages. Such as:
- Page Controller (
packages/page-controller/) - DOM operations and element interactions module. Independent of LLM, can be tested in unit tests.
Development Commands
Core Commands
npm start # Start website dev server
npm run dev # Same as start
npm run build # Build all packages
npm run build:lib # Build page-agent library only
npm run lint # ESLint with TypeScript strict rules
Package-specific Commands
# Core library
npm run build --workspace=page-agent
npm run build:watch --workspace=page-agent
# Website
npm run dev --workspace=@page-agent/website
npm run build --workspace=@page-agent/website
Architecture & Critical Patterns
Monorepo Structure
We adopt a very simple monorepo solution: ts reference + vite alias.
We use the same vite config for dev and bundling. Local packages (even when they are published to npm) will be bundled into artifacts instead of installed from npm. That is why we put local packages in devDependencies (with version "*") rather than dependencies.
You must update relative tsconfig and vite config if you add/remove/rename a package.
packages/
├── page-agent/ # npm: "page-agent" ⭐ MAIN
│ ├── src/ # AI agent source
│ │ ├── PageAgent.ts # Main AI agent class
│ │ ├── tools/ # LLM tool definitions
│ │ ├── llms/ # LLM integration
│ │ └── ui/ # UI components
│ ├── vite.config.js # Library build (ES + UMD)
│ └── package.json
├── website/ # npm: "@page-agent/website" (private) ⭐ MAIN
│ ├── src/ # Website source
│ └── index.html # Entry of vite webpage
│
│ # ...internal packages below...
│
└── page-controller/ # npm: "@page-agent/page-controller"
└── src/ # DOM operations source
├── PageController.ts # Main controller class
├── actions.ts # Element interaction actions
└── dom/ # DOM tree extraction
Module Boundaries (Critical)
- Website (
packages/website/): CAN import frompage-agentfor demos. Alias@/→website/src/ - Page Agent (
packages/page-agent/): The core lib. Imports from all internal packages. Never import from website. - Page Controller (
packages/page-controller/): Internal lib. Pure DOM operations, NO LLM dependency. Never import from page-agent.
PageController ↔ PageAgent Communication
All communication between PageAgent and PageController is async and isolated:
// PageAgent delegates DOM operations to PageController
await this.pageController.updateTree() // Refresh DOM state
await this.pageController.clickElement(index) // Click by index
await this.pageController.inputText(index, text)
await this.pageController.scroll({ down: true, numPages: 1 })
// PageController exposes state via async methods
const simplifiedHTML = await this.pageController.getSimplifiedHTML()
const pageInfo = await this.pageController.getPageInfo()
DOM element references and internal state (selectorMap, elementTextMap) are encapsulated in PageController.
DOM Pipeline
- DOM Extraction: Convert live DOM to
FlatDomTreeviapage-controller/src/dom/dom_tree/ - Dehydration: DOM tree → simplified text for LLM processing
- LLM Processing: AI model returns action plans (in page-agent)
- Indexed Operations: PageAgent calls PageController methods by element index
Event Bus Communication
Use src/utils/bus.ts for decoupled PageAgent ↔ UI communication:
// Emit from PageAgent
getEventBus().emit('panel:show')
getEventBus().emit('panel:update', { status: 'thinking' })
// Listen in UI components
getEventBus().on('panel:show', () => panel.show())
Hash Routing Requirement
Uses wouter with useHashLocation for static hosting:
<Router hook={useHashLocation}> // Always hash-based routes
CDN Auto-Injection Pattern
Library auto-initializes when injected via script tag:
<script src="page-agent.js?model=gpt-4"></script>
Query params configure PageAgentConfig automatically in src/entry.ts.
Key Files Reference
Page Agent (packages/page-agent/)
| File | Description |
|---|---|
src/PageAgent.ts |
⭐ Main AI agent class orchestrating tools and LLM |
src/entry.ts |
CDN/UMD entry point with auto-initialization |
src/tools/ |
Tool definitions that call PageController methods |
src/utils/bus.ts |
Type-safe event bus for decoupled communication |
src/ui/ |
UI components (Panel, SimulatorMask) with CSS modules |
src/llms/ |
LLM integration and communication layer |
src/patches/ |
Framework-specific optimizations (React, Antd) |
vite.config.js |
Library build configuration (ES + UMD) |
Page Controller (packages/page-controller/)
| File | Description |
|---|---|
src/PageController.ts |
⭐ Main controller class managing DOM state and actions |
src/actions.ts |
Element interaction implementations (click, input, scroll) |
src/dom/dom_tree/index.js |
Core DOM extraction engine (ported from browser-use) |
src/dom/getPageInfo.ts |
Page scroll/size information |
src/types.ts |
TypeScript interfaces for controller |
Website (packages/website/)
| File | Description |
|---|---|
src/router.tsx |
⭐ Central routing (manual registration required) |
src/components/DocsLayout.tsx |
Navigation structure (hardcoded nav items) |
src/main.tsx |
Site entry with hash routing setup |
src/docs/[section]/[topic]/page.tsx |
Documentation pages |
src/test-pages/ |
Library integration test pages |
vite.config.js |
Website build configuration |
Adding New Features
New Documentation Page
- Create
packages/website/src/docs/<section>/<slug>/page.tsx - Add route to
packages/website/src/router.tsxwith<Header /> + <DocsLayout>wrapper - Add navigation item to
DocsLayout.tsx
New Agent Tool
- Implement tool in
packages/page-agent/src/tools/index.ts - If tool needs DOM operations, add method to PageController first
- Tool calls
this.pageController.methodName()for DOM interactions
New PageController Action
- Add action implementation in
packages/page-controller/src/actions.ts - Expose via async method in
PageController.ts - Export from
packages/page-controller/src/index.ts
New UI Component
- Create in
packages/page-agent/src/ui/with colocated CSS modules - Use event bus for PageAgent communication
Code Standards
TypeScript
- Explicit typing for exported/public APIs
- ESLint relaxes some unsafe rules for rapid iteration
CSS & Styling
- Prefer Tailwind CSS over custom CSS
- Custom CSS variables for theme gradients in
src/index.css - Dark mode support via
dark:classes - CSS modules for component-specific styles
Import Organization
- External libraries first
- Internal modules (
@/,@pages/) - Relative imports last
- Blank lines between groups
Debugging Common Issues
Blank Documentation Pages
- Verify route exists in
packages/website/src/router.tsx - Check component import path
- Verify CSS isn't hiding content (check dark mode classes)
- Test with minimal component first
Library Integration Issues
- Check
packages/page-agent/dist/lib/page-agent.umd.jsbuilds correctly - Test CDN injection with query params
- Verify event bus communications are properly typed
- Use
packages/website/src/test-pages/for isolated testing