feat(ext): draft extension structure (single-page mode)
This commit is contained in:
292
packages/extension/structure.md
Normal file
292
packages/extension/structure.md
Normal file
@@ -0,0 +1,292 @@
|
||||
# PageAgentExt Architecture
|
||||
|
||||
This document describes the architecture of the Chrome extension version of PageAgent, including environment definitions, communication protocols, and extension considerations.
|
||||
|
||||
## Environment Definitions
|
||||
|
||||
The extension operates across three isolated JavaScript contexts:
|
||||
|
||||
### 1. Background (Service Worker)
|
||||
|
||||
**File:** `src/entrypoints/background.ts`
|
||||
|
||||
**Responsibilities:**
|
||||
|
||||
- Hosts the headless `PageAgentCore` instance
|
||||
- Manages agent lifecycle (create, execute, stop, dispose)
|
||||
- Stores LLM configuration in `chrome.storage.local`
|
||||
- Receives commands from SidePanel via messaging
|
||||
- Broadcasts events to SidePanel for UI updates
|
||||
- Uses `RemotePageController` to proxy DOM operations to ContentScript
|
||||
|
||||
**Key Components:**
|
||||
|
||||
- `PageAgentCore` - The AI agent (from `@page-agent/core`)
|
||||
- `RemotePageController` - Proxy that forwards calls to ContentScript
|
||||
- Command handlers for `agent:execute`, `agent:stop`, `agent:configure`
|
||||
|
||||
### 2. Content Script
|
||||
|
||||
**File:** `src/entrypoints/content.ts`
|
||||
|
||||
**Responsibilities:**
|
||||
|
||||
- Runs in the context of web pages
|
||||
- Hosts the real `PageController` instance
|
||||
- Performs actual DOM operations (click, input, scroll, etc.)
|
||||
- Responds to RPC messages from Background
|
||||
- Manages visual mask overlay during automation
|
||||
|
||||
**Key Components:**
|
||||
|
||||
- `PageController` - DOM controller (from `@page-agent/page-controller`)
|
||||
- RPC handlers for all PageController methods
|
||||
|
||||
### 3. Side Panel (React UI)
|
||||
|
||||
**Files:** `src/entrypoints/sidepanel/`
|
||||
|
||||
**Responsibilities:**
|
||||
|
||||
- Provides user interface for controlling the agent
|
||||
- Displays task input and execution history
|
||||
- Shows real-time agent activity (thinking, executing, etc.)
|
||||
- Manages LLM configuration settings
|
||||
- Sends commands to Background and receives event updates
|
||||
|
||||
**Key Components:**
|
||||
|
||||
- `App.tsx` - Main React component with chat-style UI
|
||||
- `ConfigPanel` - Settings form for LLM configuration
|
||||
- Event subscription for real-time updates
|
||||
|
||||
## Communication Architecture
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ Side Panel │
|
||||
│ ┌──────────────┐ ┌──────────────┐ ┌───────────────────────┐ │
|
||||
│ │ Task Input │ │ Event Stream │ │ History Display │ │
|
||||
│ └──────┬───────┘ └──────▲───────┘ └───────────────────────┘ │
|
||||
└─────────┼─────────────────┼─────────────────────────────────────┘
|
||||
│ Commands │ Events
|
||||
▼ │
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ Background │
|
||||
│ ┌──────────────────────────────────────────────────────────┐ │
|
||||
│ │ PageAgentCore │ │
|
||||
│ │ ┌─────────────┐ ┌─────────────┐ ┌──────────────────┐ │ │
|
||||
│ │ │ LLM │ │ Tools │ │ RemotePageCtrl │ │ │
|
||||
│ │ └─────────────┘ └─────────────┘ └────────┬─────────┘ │ │
|
||||
│ └───────────────────────────────────────────────┼───────────┘ │
|
||||
└───────────────────────────────────────────────────┼──────────────┘
|
||||
│ RPC
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ Content Script │
|
||||
│ ┌──────────────────────────────────────────────────────────┐ │
|
||||
│ │ PageController │ │
|
||||
│ │ ┌─────────────┐ ┌─────────────┐ ┌──────────────────┐ │ │
|
||||
│ │ │ DOM Tree │ │ Actions │ │ Mask │ │ │
|
||||
│ │ └─────────────┘ └─────────────┘ └──────────────────┘ │ │
|
||||
│ └───────────────────────────────────────────────────────────┘ │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌───────────────┐
|
||||
│ Web Page │
|
||||
│ DOM │
|
||||
└───────────────┘
|
||||
```
|
||||
|
||||
## Message Protocol
|
||||
|
||||
All cross-context communication uses `@webext-core/messaging` for type safety.
|
||||
|
||||
### Protocol Definition
|
||||
|
||||
**File:** `src/messaging/protocol.ts`
|
||||
|
||||
### 1. RPC Protocol (Background → ContentScript)
|
||||
|
||||
Used by `RemotePageController` to call `PageController` methods.
|
||||
|
||||
```typescript
|
||||
interface PageControllerRPCProtocol {
|
||||
// State queries
|
||||
'rpc:getCurrentUrl': () => string
|
||||
'rpc:getLastUpdateTime': () => number
|
||||
'rpc:getBrowserState': () => BrowserState
|
||||
|
||||
// DOM operations
|
||||
'rpc:updateTree': () => string
|
||||
'rpc:cleanUpHighlights': () => void
|
||||
|
||||
// Element actions
|
||||
'rpc:clickElement': (index: number) => ActionResult
|
||||
'rpc:inputText': (data: { index: number; text: string }) => ActionResult
|
||||
'rpc:selectOption': (data: { index: number; optionText: string }) => ActionResult
|
||||
'rpc:scroll': (options: ScrollOptions) => ActionResult
|
||||
'rpc:scrollHorizontally': (options: ScrollHorizontallyOptions) => ActionResult
|
||||
'rpc:executeJavascript': (script: string) => ActionResult
|
||||
|
||||
// Mask operations
|
||||
'rpc:showMask': () => void
|
||||
'rpc:hideMask': () => void
|
||||
|
||||
// Lifecycle
|
||||
'rpc:dispose': () => void
|
||||
}
|
||||
```
|
||||
|
||||
### 2. Command Protocol (SidePanel → Background)
|
||||
|
||||
Used by SidePanel UI to control the agent.
|
||||
|
||||
```typescript
|
||||
interface AgentCommandProtocol {
|
||||
'agent:execute': (task: string) => void
|
||||
'agent:stop': () => void
|
||||
'agent:getState': () => AgentState
|
||||
'agent:configure': (config: LLMConfig) => void
|
||||
}
|
||||
```
|
||||
|
||||
### 3. Event Protocol (Background → SidePanel)
|
||||
|
||||
Used by Background to push updates to SidePanel.
|
||||
|
||||
```typescript
|
||||
interface AgentEventProtocol {
|
||||
'event:status': (status: AgentStatus) => void
|
||||
'event:history': (history: HistoricalEvent[]) => void
|
||||
'event:activity': (activity: AgentActivity) => void
|
||||
'event:stateSnapshot': (state: AgentState) => void
|
||||
}
|
||||
```
|
||||
|
||||
## Communication Flow
|
||||
|
||||
### Task Execution Flow
|
||||
|
||||
```
|
||||
1. User enters task in SidePanel
|
||||
└─> SidePanel sends 'agent:execute' command
|
||||
|
||||
2. Background receives command
|
||||
├─> Creates PageAgentCore with RemotePageController
|
||||
└─> Starts task execution
|
||||
|
||||
3. Agent executes step loop:
|
||||
├─> LLM generates next action
|
||||
├─> Agent calls RemotePageController method
|
||||
│ └─> RPC message sent to ContentScript
|
||||
├─> ContentScript executes on real PageController
|
||||
│ └─> RPC response returned
|
||||
├─> Agent updates history
|
||||
└─> Background broadcasts events to SidePanel
|
||||
|
||||
4. SidePanel receives events
|
||||
└─> Updates UI (status, history, activity)
|
||||
|
||||
5. Task completes or user stops
|
||||
└─> Agent disposes, status changes to idle/completed/error
|
||||
```
|
||||
|
||||
### Configuration Flow
|
||||
|
||||
```
|
||||
1. User opens Settings in SidePanel
|
||||
2. User enters API credentials
|
||||
3. SidePanel sends 'agent:configure' command
|
||||
4. Background saves config to chrome.storage.local
|
||||
5. Next agent creation uses new config
|
||||
```
|
||||
|
||||
## File Structure
|
||||
|
||||
```
|
||||
packages/extension/src/
|
||||
├── agent/
|
||||
│ └── RemotePageController.ts # Proxy for PageController
|
||||
├── entrypoints/
|
||||
│ ├── background.ts # Service worker
|
||||
│ ├── content.ts # Content script
|
||||
│ └── sidepanel/
|
||||
│ ├── index.html
|
||||
│ ├── main.tsx
|
||||
│ └── App.tsx # Main UI component
|
||||
├── messaging/
|
||||
│ ├── protocol.ts # Message type definitions
|
||||
│ ├── rpc.ts # RPC client for PageController
|
||||
│ ├── events.ts # Event broadcasting utilities
|
||||
│ └── index.ts # Module exports
|
||||
├── components/ui/ # shadcn components
|
||||
├── lib/utils.ts # Utility functions
|
||||
└── assets/index.css # Tailwind styles
|
||||
```
|
||||
|
||||
## Extension Considerations
|
||||
|
||||
### Current Limitations (v1)
|
||||
|
||||
1. **Single page control only** - Agent controls the active tab where SidePanel was opened
|
||||
2. **No cross-tab navigation** - Cannot follow links that open in new tabs
|
||||
3. **Session-based** - Agent state is not persisted across extension restarts
|
||||
|
||||
### Future Extension Points
|
||||
|
||||
#### Multi-tab Control
|
||||
|
||||
To support controlling multiple tabs:
|
||||
|
||||
1. Add `tabId` parameter to RPC messages
|
||||
2. Track tab-to-controller mapping in Background
|
||||
3. Allow SidePanel to switch between controlled tabs
|
||||
|
||||
#### Persistent Sessions
|
||||
|
||||
To persist agent sessions:
|
||||
|
||||
1. Store session state in `chrome.storage.local`
|
||||
2. Restore agent on extension startup
|
||||
3. Handle service worker restarts gracefully
|
||||
|
||||
#### Cross-tab Navigation
|
||||
|
||||
To follow links in new tabs:
|
||||
|
||||
1. Listen to `chrome.tabs.onCreated` events
|
||||
2. Inject content script into new tabs
|
||||
3. Transfer control to new tab when navigation occurs
|
||||
|
||||
#### Screenshot/Vision Support
|
||||
|
||||
To add visual context for the agent:
|
||||
|
||||
1. Use `chrome.tabs.captureVisibleTab` for screenshots
|
||||
2. Send images to vision-capable LLM models
|
||||
3. Add screenshot tool to agent toolkit
|
||||
|
||||
## Security Considerations
|
||||
|
||||
1. **API Key Storage** - Keys stored in `chrome.storage.local` (extension-only access)
|
||||
2. **Content Script Isolation** - Runs in isolated world, not accessible to page scripts
|
||||
3. **Message Validation** - Only trusted extension contexts can send/receive messages
|
||||
4. **Permission Scope** - Request minimal permissions needed for functionality
|
||||
|
||||
## Development
|
||||
|
||||
```bash
|
||||
# Install dependencies
|
||||
npm install
|
||||
|
||||
# Start development server
|
||||
npm run dev
|
||||
|
||||
# Build for production
|
||||
npm run build
|
||||
|
||||
# Package extension
|
||||
npm run zip
|
||||
```
|
||||
Reference in New Issue
Block a user