refactor(ext): rewrite ext. totally-broken -> still-broken; THIS IS NOT WORKING
This commit is contained in:
@@ -1,247 +1,147 @@
|
||||
# PageAgentExt Architecture
|
||||
|
||||
This document describes the MV3-compliant architecture of the Chrome extension version of PageAgent.
|
||||
MV3-compliant Chrome extension architecture.
|
||||
|
||||
## Design Principles
|
||||
|
||||
The architecture follows Chrome MV3 Service Worker constraints:
|
||||
1. **Service Worker is stateless** - Only relays messages, no state
|
||||
2. **Agent runs in SidePanel** - All agent logic lives there
|
||||
3. **Unidirectional communication** - Agent → SW → Content
|
||||
4. **Storage-based coordination** - Mask state via chrome.storage
|
||||
|
||||
1. **Service Worker is stateless** - No long-running loops, no in-memory state
|
||||
2. **Agent runs in frontend context** - SidePanel hosts all agent logic
|
||||
3. **SW is a message relay** - Only forwards messages between contexts
|
||||
4. **Event-driven** - All operations are triggered by user actions or message events
|
||||
## Environments
|
||||
|
||||
## Environment Definitions
|
||||
|
||||
The extension operates across three isolated JavaScript contexts:
|
||||
|
||||
### 1. Side Panel (Frontend - Agent Host)
|
||||
### 1. Side Panel (Agent Host)
|
||||
|
||||
**Files:** `src/entrypoints/sidepanel/`
|
||||
|
||||
**Responsibilities:**
|
||||
|
||||
- Hosts `PageAgentCore` instance and main execution loop
|
||||
- Hosts `PageAgentCore` and execution loop
|
||||
- Manages `TabsManager` for multi-tab control
|
||||
- Uses `RemotePageController` to proxy DOM operations via SW
|
||||
- Stores agent state (task, history, status)
|
||||
- Provides React UI for user interaction
|
||||
- Handles `shouldShowMask` queries from content scripts
|
||||
- Uses `RemotePageController` for RPC to content script
|
||||
- Writes agent state to storage for mask coordination
|
||||
|
||||
**Key Components:**
|
||||
|
||||
- `AgentController` - Encapsulates agent lifecycle, isolated from UI
|
||||
- `useAgent` hook - React integration for AgentController
|
||||
- `App.tsx` - Main UI component
|
||||
- `ConfigPanel` - LLM settings
|
||||
- `AgentController` - Agent lifecycle, writes `agentState` to storage
|
||||
- `useAgent` hook - React integration
|
||||
- `App.tsx` - Main UI
|
||||
|
||||
**Lifecycle:** When sidepanel closes, agent disposes naturally. No state persists in SW.
|
||||
|
||||
### 2. Background (Service Worker - Stateless Relay)
|
||||
### 2. Background (Service Worker)
|
||||
|
||||
**File:** `src/entrypoints/background.ts`
|
||||
|
||||
**Responsibilities:**
|
||||
**Only two responsibilities:**
|
||||
|
||||
- Relays RPC messages from SidePanel to ContentScript
|
||||
- Forwards tab events (onRemoved, onUpdated, onActivated, onFocusChanged) to SidePanel
|
||||
- Opens sidepanel on action click
|
||||
- **NO** agent logic, **NO** state
|
||||
1. Relay `AGENT_TO_PAGE` messages to content script
|
||||
2. Broadcast `TAB_CHANGE` events
|
||||
|
||||
**Message Flows:**
|
||||
|
||||
```
|
||||
SidePanel → SW → ContentScript (RPC calls)
|
||||
ContentScript → SW → SidePanel (mask state queries)
|
||||
SW → SidePanel (tab events)
|
||||
```
|
||||
**No state, no agent logic.**
|
||||
|
||||
### 3. Content Script
|
||||
|
||||
**File:** `src/entrypoints/content.ts`
|
||||
|
||||
**Responsibilities:**
|
||||
|
||||
- Runs in web page context
|
||||
- Hosts real `PageController` instance (lazy-initialized)
|
||||
- Hosts `PageController` (lazy-initialized)
|
||||
- Handles RPC messages for DOM operations
|
||||
- Queries SidePanel for mask state on page load
|
||||
- Manages visual mask overlay
|
||||
|
||||
**Lifecycle:** PageController is created on first RPC call and disposed between tasks.
|
||||
- Polls storage every 1s for mask state
|
||||
- Uses `document.visibilityState` to manage mask visibility
|
||||
|
||||
## Architecture Diagram
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ Side Panel (Frontend) │
|
||||
│ Side Panel │
|
||||
│ ┌────────────────────────────────────────────────────────────┐ │
|
||||
│ │ AgentController │ │
|
||||
│ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────────┐ │ │
|
||||
│ │ │ PageAgentCore│ │ TabsManager │ │RemotePageController│ │ │
|
||||
│ │ └──────────────┘ └──────────────┘ └────────┬─────────┘ │ │
|
||||
│ └───────────────────────────────────────────────┼────────────┘ │
|
||||
│ │ │
|
||||
│ ┌──────────────┐ ┌──────────────┐ │ │
|
||||
│ │ React UI │ │ Query Handler│◄─────────────┼───────────┐ │
|
||||
│ │ (App.tsx) │ │(shouldShowMask) │ │ │
|
||||
│ └──────────────┘ └──────────────┘ │ │ │
|
||||
└──────────────────────────────────────────────────┼───────────┼───┘
|
||||
│ │
|
||||
RPC Call │ Query │
|
||||
▼ │
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ Background (Service Worker) │
|
||||
│ │
|
||||
│ ┌────────────────┐ │
|
||||
│ │ Message Relay │ │
|
||||
│ │ (stateless) │ │
|
||||
│ └───────┬────────┘ │
|
||||
│ │ │
|
||||
│ Tab Events ─────────────────┼─────────────────► SidePanel │
|
||||
│ (removed, updated, │ │
|
||||
│ activated, focusChanged) │ │
|
||||
└──────────────────────────────┼───────────────────────────────────┘
|
||||
│ RPC Forward
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ Content Script │
|
||||
│ ┌────────────────────────────────────────────────────────────┐ │
|
||||
│ │ PageController │ │
|
||||
│ │ ┌─────────────┐ ┌─────────────┐ ┌──────────────────┐ │ │
|
||||
│ │ │ DOM Tree │ │ Actions │ │ Mask │ │ │
|
||||
│ │ └─────────────┘ └─────────────┘ └──────────────────┘ │ │
|
||||
│ └────────────────────────────────────────────────────────────┘ │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌───────────────┐
|
||||
│ Web Page │
|
||||
│ DOM │
|
||||
└───────────────┘
|
||||
│ │ │ │
|
||||
│ │ write agentState │ AGENT_TO_PAGE │
|
||||
│ ▼ ▼ │
|
||||
└─────────────────────────┼────────────────────────┼───────────────┘
|
||||
│ │
|
||||
┌─────────┴─────────┐ │
|
||||
│ chrome.storage │ │
|
||||
└─────────┬─────────┘ │
|
||||
│ │
|
||||
│ poll │
|
||||
│ ▼
|
||||
┌─────────────────────────┼─────────────────────────────────────────┐
|
||||
│ │ Background (SW) │
|
||||
│ │ ┌────────────────┐ │
|
||||
│ │ │ Message Relay │ │
|
||||
│ │ │ (stateless) │ │
|
||||
│ │ └───────┬────────┘ │
|
||||
│ │ │ │
|
||||
│ TAB_CHANGE broadcast ──┼─────────────┼─────────────► │
|
||||
└─────────────────────────┼─────────────┼────────────────────────────┘
|
||||
│ │ forward
|
||||
│ ▼
|
||||
┌─────────────────────────┼─────────────────────────────────────────┐
|
||||
│ Content Script │ │
|
||||
│ ┌──────────────────────┴───────────────────────────────────────┐ │
|
||||
│ │ PageController │ │
|
||||
│ │ ┌─────────────┐ ┌─────────────┐ ┌──────────────────┐ │ │
|
||||
│ │ │ DOM Tree │ │ Actions │ │ Mask (storage │ │ │
|
||||
│ │ │ │ │ │ │ polling + vis) │ │ │
|
||||
│ │ └─────────────┘ └─────────────┘ └──────────────────┘ │ │
|
||||
│ └──────────────────────────────────────────────────────────────┘ │
|
||||
└─────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Message Protocol
|
||||
|
||||
All messages use a simple type-based protocol defined in `src/messaging/protocol.ts`.
|
||||
|
||||
### Message Types
|
||||
Only two message types:
|
||||
|
||||
| Type | Direction | Purpose |
|
||||
|------|-----------|---------|
|
||||
| `rpc:call` | SidePanel → SW | Request to call PageController method |
|
||||
| `rpc:response` | SW → SidePanel | Response from PageController |
|
||||
| `cs:rpc` | SW → ContentScript | Forwarded RPC call |
|
||||
| `cs:query` | ContentScript → SW | Query to SidePanel (e.g., shouldShowMask) |
|
||||
| `query:response` | SW → ContentScript | Response to query |
|
||||
| `tab:event` | SW → SidePanel | Tab events (removed/updated/activated/focusChanged) |
|
||||
| `AGENT_TO_PAGE` | SidePanel → SW → Content | RPC call to PageController |
|
||||
| `TAB_CHANGE` | SW → All | Tab events broadcast |
|
||||
|
||||
### RPC Methods
|
||||
|
||||
All PageController methods are available via RPC:
|
||||
|
||||
- State: `getCurrentUrl`, `getLastUpdateTime`, `getBrowserState`
|
||||
- DOM: `updateTree`, `cleanUpHighlights`
|
||||
- Actions: `clickElement`, `inputText`, `selectOption`, `scroll`, `scrollHorizontally`, `executeJavascript`
|
||||
- Mask: `showMask`, `hideMask`
|
||||
- Lifecycle: `dispose`
|
||||
|
||||
## Communication Flow
|
||||
## Mask Management
|
||||
|
||||
### Task Execution
|
||||
Mask visibility is managed autonomously by content script via storage polling.
|
||||
|
||||
```
|
||||
1. User enters task in SidePanel
|
||||
└─> AgentController.execute(task)
|
||||
### Storage State
|
||||
|
||||
2. AgentController creates agent instances
|
||||
├─> new PageAgentCore()
|
||||
├─> new TabsManager()
|
||||
└─> new RemotePageController()
|
||||
|
||||
3. Agent executes step loop:
|
||||
├─> LLM generates next action
|
||||
├─> RemotePageController.method() called
|
||||
│ └─> RPC message → SW → ContentScript
|
||||
├─> ContentScript executes on real PageController
|
||||
│ └─> Response → SW → SidePanel
|
||||
├─> Agent updates history
|
||||
└─> React UI re-renders via events
|
||||
|
||||
4. Task completes or user stops
|
||||
└─> Agent disposes, status changes
|
||||
```typescript
|
||||
interface AgentState {
|
||||
tabId: number | null // Agent's current tab
|
||||
running: boolean // Agent is executing
|
||||
}
|
||||
// Key: 'agentState'
|
||||
```
|
||||
|
||||
### Page Reload During Task
|
||||
### Content Script Logic
|
||||
|
||||
```
|
||||
1. Page reloads/navigates
|
||||
2. Content script initializes
|
||||
3. Content script queries: shouldShowMask?
|
||||
└─> cs:query → SW → SidePanel
|
||||
4. SidePanel checks: agentRunning + windowFocus + (browserActiveTab === agentCurrentTab)
|
||||
└─> query:response → SW → ContentScript
|
||||
5. Content script shows/hides mask accordingly
|
||||
```typescript
|
||||
setInterval(async () => {
|
||||
const { agentState } = await chrome.storage.local.get('agentState')
|
||||
|
||||
const shouldShow =
|
||||
agentState?.running &&
|
||||
agentState?.tabId === myTabId &&
|
||||
document.visibilityState === 'visible'
|
||||
|
||||
if (shouldShow) showMask()
|
||||
else hideMask()
|
||||
}, 1000)
|
||||
```
|
||||
|
||||
## File Structure
|
||||
### Agent Updates Storage
|
||||
|
||||
```
|
||||
packages/extension/src/
|
||||
├── agent/
|
||||
│ ├── RemotePageController.ts # Proxy for PageController RPC
|
||||
│ ├── TabsManager.ts # Multi-tab management
|
||||
│ └── tabTools.ts # Agent tools for tab control
|
||||
├── entrypoints/
|
||||
│ ├── background.ts # Stateless SW relay
|
||||
│ ├── content.ts # Content script with PageController
|
||||
│ └── sidepanel/
|
||||
│ ├── AgentController.ts # Agent lifecycle management
|
||||
│ ├── useAgent.ts # React hook for agent
|
||||
│ ├── App.tsx # Main UI component
|
||||
│ ├── components/
|
||||
│ │ ├── ConfigPanel.tsx
|
||||
│ │ ├── cards/
|
||||
│ │ └── index.tsx
|
||||
│ ├── index.html
|
||||
│ └── main.tsx
|
||||
├── messaging/
|
||||
│ ├── protocol.ts # Message type definitions
|
||||
│ ├── rpc.ts # RPC client for SidePanel
|
||||
│ └── index.ts
|
||||
├── components/ui/ # shadcn components
|
||||
├── lib/utils.ts
|
||||
└── utils/constants.ts
|
||||
```
|
||||
|
||||
## Design Decisions
|
||||
|
||||
### Why Agent in SidePanel?
|
||||
|
||||
MV3 Service Workers have strict lifecycle constraints:
|
||||
- Terminate after ~30s of inactivity
|
||||
- Cannot maintain long-running loops
|
||||
- State is lost on termination
|
||||
|
||||
By hosting the agent in SidePanel (a visible frontend page), we get:
|
||||
- Persistent execution while panel is open
|
||||
- Natural disposal when panel closes
|
||||
- No SW wake-up complexity
|
||||
|
||||
### Agent Isolation from UI
|
||||
|
||||
`AgentController` is a separate class from the React UI for:
|
||||
- **Testability** - Can test agent logic without React
|
||||
- **Portability** - Future: move agent to popup, options page, or external page
|
||||
- **Clean separation** - UI concerns don't pollute agent logic
|
||||
|
||||
### Simplified Messaging
|
||||
|
||||
Previous architecture had complex retry/wake-up logic for SW. New architecture:
|
||||
- SW is stateless, always ready
|
||||
- No ping/wake-up needed
|
||||
- Simple request-response pattern
|
||||
- Retry logic only for content script initialization
|
||||
- Task start: `{ tabId, running: true }`
|
||||
- Tab switch: `{ tabId: newTabId, running: true }`
|
||||
- Task end: `{ tabId: null, running: false }`
|
||||
|
||||
## Multi-Tab Control
|
||||
|
||||
@@ -252,69 +152,34 @@ Previous architecture had complex retry/wake-up logic for SW. New architecture:
|
||||
|
||||
### Tab Grouping
|
||||
|
||||
Agent-opened tabs are grouped in a Chrome tab group named `Task(<taskId>)`.
|
||||
Agent-opened tabs are grouped in Chrome tab group `Task(<taskId>)`.
|
||||
|
||||
### Tab Switching
|
||||
|
||||
Only initial tab and managed tabs can be switched to. This prevents the agent from accessing unrelated tabs.
|
||||
|
||||
## Mask Management
|
||||
|
||||
The visual mask overlay blocks user interaction during automation. Mask visibility is centrally controlled by `AgentController` based on three conditions:
|
||||
## File Structure
|
||||
|
||||
```
|
||||
shouldMaskBeVisible = agentRunning && windowHasFocus && (browserActiveTab === agentCurrentTab)
|
||||
packages/extension/src/
|
||||
├── agent/
|
||||
│ ├── AgentController.ts # Agent lifecycle, storage updates
|
||||
│ ├── RemotePageController.ts # RPC proxy for PageController
|
||||
│ ├── TabsManager.ts # Multi-tab management
|
||||
│ ├── protocol.ts # Message types (AGENT_TO_PAGE, TAB_CHANGE)
|
||||
│ ├── rpc.ts # RPC client
|
||||
│ ├── tabTools.ts # Agent tools for tab control
|
||||
│ └── useAgent.ts # React hook
|
||||
├── entrypoints/
|
||||
│ ├── background.ts # Stateless SW relay
|
||||
│ ├── content.ts # Content script with storage polling
|
||||
│ └── sidepanel/
|
||||
│ ├── App.tsx
|
||||
│ ├── components/
|
||||
│ ├── index.html
|
||||
│ └── main.tsx
|
||||
├── components/ui/
|
||||
└── utils/
|
||||
```
|
||||
|
||||
### Key Concepts
|
||||
|
||||
- **browserActiveTab** - The tab currently visible to the user (tracked via `chrome.tabs.onActivated`)
|
||||
- **agentCurrentTab** - The tab agent is operating on (`TabsManager.currentTabId`)
|
||||
- **windowHasFocus** - Whether browser window has focus (tracked via `chrome.windows.onFocusChanged`)
|
||||
|
||||
### State Transitions
|
||||
|
||||
| Event | Action |
|
||||
|-------|--------|
|
||||
| Agent starts | Show mask if current tab is in foreground |
|
||||
| Agent stops | Hide mask |
|
||||
| User switches to agent's tab | Show mask |
|
||||
| User switches away from agent's tab | Hide mask |
|
||||
| Window loses focus | Hide mask |
|
||||
| Window regains focus | Show mask if on agent's tab |
|
||||
| Agent switches to different tab | Sync mask based on new state |
|
||||
| Page reloads | Content script queries `shouldShowMask` |
|
||||
|
||||
### Implementation
|
||||
|
||||
- `AgentController.syncMaskState()` - Syncs mask visibility based on current state
|
||||
- `AgentController.shouldShowMaskForTab(tabId)` - Used by content script queries
|
||||
- Background forwards `activated` and `windowFocusChanged` events to SidePanel
|
||||
- `RemotePageController` does NOT auto-show mask on tab switch (controlled by AgentController)
|
||||
|
||||
## Configuration
|
||||
|
||||
LLM config (apiKey, baseURL, model) is stored in `chrome.storage.local`. This persists across sessions and is managed via the ConfigPanel.
|
||||
|
||||
## Security
|
||||
|
||||
1. **API Key Storage** - Keys in `chrome.storage.local` (extension-only access)
|
||||
1. **API Key Storage** - Keys in `chrome.storage.local`
|
||||
2. **Content Script Isolation** - Runs in isolated world
|
||||
3. **Tab Restriction** - Agent can only control tabs it opened or started from
|
||||
4. **No Arbitrary Tab Access** - Cannot switch to unmanaged tabs
|
||||
|
||||
## Development
|
||||
|
||||
```bash
|
||||
# Install dependencies
|
||||
npm install
|
||||
|
||||
# Start development server
|
||||
npm run dev
|
||||
|
||||
# Build for production
|
||||
npm run build
|
||||
|
||||
# Package extension
|
||||
npm run zip
|
||||
```
|
||||
3. **Tab Restriction** - Agent only controls its own tabs
|
||||
|
||||
Reference in New Issue
Block a user