8.5 KiB
8.5 KiB
PageAgentExt Architecture
MV3-compliant Chrome extension architecture.
Design Principles
- Service Worker is stateless - Only relays messages, no state
- Agent runs in SidePanel - All agent logic lives there
- Unidirectional communication - Agent → SW → Content
- Storage-based coordination - Mask state via chrome.storage
Environments
1. Side Panel (Agent Host)
Files: src/entrypoints/sidepanel/
- Hosts
PageAgentCoreand execution loop - Manages
TabsManagerfor multi-tab control - Uses
RemotePageControllerfor RPC to content script - Writes agent state to storage for mask coordination
Key Components:
AgentController- Agent lifecycle, writesagentStateto storageuseAgenthook - React integrationApp.tsx- Main UI
2. Background (Service Worker)
File: src/entrypoints/background.ts
Only two responsibilities:
- Relay
AGENT_TO_PAGEmessages to content script - Broadcast
TAB_CHANGEevents
No state, no agent logic.
3. Content Script
File: src/entrypoints/content.ts
- Hosts
PageController(lazy-initialized) - Handles RPC messages for DOM operations
- Polls storage every 1s for mask state
- Uses
document.visibilityStateto manage mask visibility
Architecture Diagram
┌─────────────────────────────────────────────────────────────────┐
│ Side Panel │
│ ┌────────────────────────────────────────────────────────────┐ │
│ │ AgentController │ │
│ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────────┐ │ │
│ │ │ PageAgentCore│ │ TabsManager │ │RemotePageController│ │ │
│ │ └──────────────┘ └──────────────┘ └────────┬─────────┘ │ │
│ └───────────────────────────────────────────────┼────────────┘ │
│ │ │ │
│ │ write agentState │ AGENT_TO_PAGE │
│ ▼ ▼ │
└─────────────────────────┼────────────────────────┼───────────────┘
│ │
┌─────────┴─────────┐ │
│ chrome.storage │ │
└─────────┬─────────┘ │
│ │
│ poll │
│ ▼
┌─────────────────────────┼─────────────────────────────────────────┐
│ │ Background (SW) │
│ │ ┌────────────────┐ │
│ │ │ Message Relay │ │
│ │ │ (stateless) │ │
│ │ └───────┬────────┘ │
│ │ │ │
│ TAB_CHANGE broadcast ──┼─────────────┼─────────────► │
└─────────────────────────┼─────────────┼────────────────────────────┘
│ │ forward
│ ▼
┌─────────────────────────┼─────────────────────────────────────────┐
│ Content Script │ │
│ ┌──────────────────────┴───────────────────────────────────────┐ │
│ │ PageController │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌──────────────────┐ │ │
│ │ │ DOM Tree │ │ Actions │ │ Mask (storage │ │ │
│ │ │ │ │ │ │ polling + vis) │ │ │
│ │ └─────────────┘ └─────────────┘ └──────────────────┘ │ │
│ └──────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
Message Protocol
Only two message types:
| Type | Direction | Purpose |
|---|---|---|
AGENT_TO_PAGE |
SidePanel → SW → Content | RPC call to PageController |
TAB_CHANGE |
SW → All | Tab events broadcast |
RPC Methods
- State:
getCurrentUrl,getLastUpdateTime,getBrowserState - DOM:
updateTree,cleanUpHighlights - Actions:
clickElement,inputText,selectOption,scroll,scrollHorizontally,executeJavascript - Lifecycle:
dispose
Mask Management
Mask visibility is managed autonomously by content script via storage polling.
Storage State
interface AgentState {
tabId: number | null // Agent's current tab
running: boolean // Agent is executing
}
// Key: 'agentState'
Content Script Logic
setInterval(async () => {
const { agentState } = await chrome.storage.local.get('agentState')
const shouldShow =
agentState?.running &&
agentState?.tabId === myTabId &&
document.visibilityState === 'visible'
if (shouldShow) showMask()
else hideMask()
}, 1000)
Agent Updates Storage
- Task start:
{ tabId, running: true } - Tab switch:
{ tabId: newTabId, running: true } - Task end:
{ tabId: null, running: false }
Multi-Tab Control
Tab Types
- Initial Tab - Where user started the task
- Managed Tabs - Tabs opened by agent via
open_new_tab
Tab Grouping
Agent-opened tabs are grouped in Chrome tab group Task(<taskId>).
File Structure
packages/extension/src/
├── agent/
│ ├── AgentController.ts # Agent lifecycle, storage updates
│ ├── RemotePageController.ts # RPC proxy for PageController
│ ├── TabsManager.ts # Multi-tab management
│ ├── protocol.ts # Message types (AGENT_TO_PAGE, TAB_CHANGE)
│ ├── rpc.ts # RPC client
│ ├── tabTools.ts # Agent tools for tab control
│ └── useAgent.ts # React hook
├── entrypoints/
│ ├── background.ts # Stateless SW relay
│ ├── content.ts # Content script with storage polling
│ └── sidepanel/
│ ├── App.tsx
│ ├── components/
│ ├── index.html
│ └── main.tsx
├── components/ui/
└── utils/
Security
- API Key Storage - Keys in
chrome.storage.local - Content Script Isolation - Runs in isolated world
- Tab Restriction - Agent only controls its own tabs