3.1 KiB
PageAgent 🤖
Transform any webpage into an AI-powered application with a single script tag.
PageAgent is an intelligent UI agent for web automation and DOM interaction. Built on browser-use architecture, it enables natural language control of web interfaces through LLM integration.
🌐 English | 中文
👉 📖 Documentation | 🚀 Try Demo
✨ Features
- 🎯 Easy Integration - Add to any webpage via CDN or npm
- 🔐 Client-Side Processing - No data leaves the browser
- 🧠 DOM Extraction
- 💬 Natural Language Interface
- 🎨 UI with Human in the loop
🗺️ Roadmap
👉 Roadmap
🚀 Quick Start
CDN Integration
TODO: CDN endpoint to be determined.
<!-- CDN script tag - URL to be updated -->
<script src="TODO-CDN-URL"></script>
NPM Installation
npm install page-agent
import { PageAgent } from 'page-agent'
const agent = new PageAgent({
modelName: 'gpt-4.1-mini'
baseURL: 'xxxx',
apiKey: 'xxxx'
})
await agent.execute("Click the login button")
🏗️ Structure
PageAgent follows a clean, modular architecture:
src/
├── PageAgent.ts # Agent main loop
├── dom/ # DOM processing
├── tools/ # Agent tools
├── ui/ # UI components & panels
├── llms/ # LLM integration layer
└── utils/ # Event bus & utilities
🤝 Contributing
We welcome contributions from the community! Here's how to get started:
Setup
- Fork the repository
- Clone your fork:
git clone https://github.com/alibaba/page-agent.git && cd page-agent - Install dependencies:
npm install - Start development:
npm start
Contributing Guidelines
Please read our Code of Conduct and Contributing Guide before contributing.
👏 Acknowledgments
This project builds upon the excellent work of:
PageAgent is designed for client-side web enhancement, not server-side automation.
📄 License
MIT License - see the LICENSE file for details.
DOM processing components and prompt are derived from browser-use (MIT License). See NOTICE for full attribution.
⭐ Star this repo if you find PageAgent helpful!