Files
page-agent/packages/extension/structure.md

186 lines
8.5 KiB
Markdown

# PageAgentExt Architecture
MV3-compliant Chrome extension architecture.
## Design Principles
1. **Service Worker is stateless** - Only relays messages, no state
2. **Agent runs in SidePanel** - All agent logic lives there
3. **Unidirectional communication** - Agent → SW → Content
4. **Storage-based coordination** - Mask state via chrome.storage
## Environments
### 1. Side Panel (Agent Host)
**Files:** `src/entrypoints/sidepanel/`
- Hosts `PageAgentCore` and execution loop
- Manages `TabsManager` for multi-tab control
- Uses `RemotePageController` for RPC to content script
- Writes agent state to storage for mask coordination
**Key Components:**
- `AgentController` - Agent lifecycle, writes `agentState` to storage
- `useAgent` hook - React integration
- `App.tsx` - Main UI
### 2. Background (Service Worker)
**File:** `src/entrypoints/background.ts`
**Only two responsibilities:**
1. Relay `AGENT_TO_PAGE` messages to content script
2. Broadcast `TAB_CHANGE` events
**No state, no agent logic.**
### 3. Content Script
**File:** `src/entrypoints/content.ts`
- Hosts `PageController` (lazy-initialized)
- Handles RPC messages for DOM operations
- Polls storage every 1s for mask state
- Uses `document.visibilityState` to manage mask visibility
## Architecture Diagram
```
┌─────────────────────────────────────────────────────────────────┐
│ Side Panel │
│ ┌────────────────────────────────────────────────────────────┐ │
│ │ AgentController │ │
│ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────────┐ │ │
│ │ │ PageAgentCore│ │ TabsManager │ │RemotePageController│ │ │
│ │ └──────────────┘ └──────────────┘ └────────┬─────────┘ │ │
│ └───────────────────────────────────────────────┼────────────┘ │
│ │ │ │
│ │ write agentState │ AGENT_TO_PAGE │
│ ▼ ▼ │
└─────────────────────────┼────────────────────────┼───────────────┘
│ │
┌─────────┴─────────┐ │
│ chrome.storage │ │
└─────────┬─────────┘ │
│ │
│ poll │
│ ▼
┌─────────────────────────┼─────────────────────────────────────────┐
│ │ Background (SW) │
│ │ ┌────────────────┐ │
│ │ │ Message Relay │ │
│ │ │ (stateless) │ │
│ │ └───────┬────────┘ │
│ │ │ │
│ TAB_CHANGE broadcast ──┼─────────────┼─────────────► │
└─────────────────────────┼─────────────┼────────────────────────────┘
│ │ forward
│ ▼
┌─────────────────────────┼─────────────────────────────────────────┐
│ Content Script │ │
│ ┌──────────────────────┴───────────────────────────────────────┐ │
│ │ PageController │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌──────────────────┐ │ │
│ │ │ DOM Tree │ │ Actions │ │ Mask (storage │ │ │
│ │ │ │ │ │ │ polling + vis) │ │ │
│ │ └─────────────┘ └─────────────┘ └──────────────────┘ │ │
│ └──────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
```
## Message Protocol
Only two message types:
| Type | Direction | Purpose |
|------|-----------|---------|
| `AGENT_TO_PAGE` | SidePanel → SW → Content | RPC call to PageController |
| `TAB_CHANGE` | SW → All | Tab events broadcast |
### RPC Methods
- State: `getCurrentUrl`, `getLastUpdateTime`, `getBrowserState`
- DOM: `updateTree`, `cleanUpHighlights`
- Actions: `clickElement`, `inputText`, `selectOption`, `scroll`, `scrollHorizontally`, `executeJavascript`
- Lifecycle: `dispose`
## Mask Management
Mask visibility is managed autonomously by content script via storage polling.
### Storage State
```typescript
interface AgentState {
tabId: number | null // Agent's current tab
running: boolean // Agent is executing
}
// Key: 'agentState'
```
### Content Script Logic
```typescript
setInterval(async () => {
const { agentState } = await chrome.storage.local.get('agentState')
const shouldShow =
agentState?.running &&
agentState?.tabId === myTabId &&
document.visibilityState === 'visible'
if (shouldShow) showMask()
else hideMask()
}, 1000)
```
### Agent Updates Storage
- Task start: `{ tabId, running: true }`
- Tab switch: `{ tabId: newTabId, running: true }`
- Task end: `{ tabId: null, running: false }`
## Multi-Tab Control
### Tab Types
- **Initial Tab** - Where user started the task
- **Managed Tabs** - Tabs opened by agent via `open_new_tab`
### Tab Grouping
Agent-opened tabs are grouped in Chrome tab group `Task(<taskId>)`.
## File Structure
```
packages/extension/src/
├── agent/
│ ├── AgentController.ts # Agent lifecycle, storage updates
│ ├── RemotePageController.ts # RPC proxy for PageController
│ ├── TabsManager.ts # Multi-tab management
│ ├── protocol.ts # Message types (AGENT_TO_PAGE, TAB_CHANGE)
│ ├── rpc.ts # RPC client
│ ├── tabTools.ts # Agent tools for tab control
│ └── useAgent.ts # React hook
├── entrypoints/
│ ├── background.ts # Stateless SW relay
│ ├── content.ts # Content script with storage polling
│ └── sidepanel/
│ ├── App.tsx
│ ├── components/
│ ├── index.html
│ └── main.tsx
├── components/ui/
└── utils/
```
## Security
1. **API Key Storage** - Keys in `chrome.storage.local`
2. **Content Script Isolation** - Runs in isolated world
3. **Tab Restriction** - Agent only controls its own tabs