Update project and configurations

2026-06-11 16:28:00 +08:00
parent 12d3922091
commit a29a91867d
237 changed files with 164880 additions and 90 deletions
--- a/docs/nlu_integration_design.md
+++ b/docs/nlu_integration_design.md
@@ -0,0 +1,310 @@
+# NLU 接入设计方案
+
+> 状态：**已确认，进入实现阶段**
+> 关联文档：`bert_integration_analysis.md`、`architecture_overview.html`
+
+---
+
+## 第一部分：概念解释 — 两套术语怎么对应
+
+### 1.1 Canvas 设计里用的词
+
+在 `architecture_overview.html` 的技术流程视图里，BERT NLU 节点描述为：
+
+```
+BERT NLU 意图识别
+输出 domain / intent / slot / confidence 置信度
+```
+
+这是一套**面向业务语义**的描述，每个词的含义：
+
+| 词 | 含义 | 例子 |
+|---|---|---|
+| `domain` | 所属业务域，意图的分组 | `machine_control`、`equipment_knowledge`、`smalltalk` |
+| `intent` | 用户想做什么，域内的细分动作 | `wirecut_set_speed`、`wirecut_start_run`、`query_alarm` |
+| `slot` | 动作的具体参数，从句子里提取的关键值 | `speed=80`、`voltage=90`、`axis=X` |
+| `confidence` | 模型对这次识别结果的置信程度，0~1 | `0.94`（高）、`0.61`（中）、`0.32`（低）|
+
+Canvas 的路由逻辑就是：拿到这四个值之后，判断 `confidence ≥ 阈值 AND domain = machine_control` → 走工具调用路径。
+
+---
+
+### 1.2 intelligent_cabin NLU 服务里用的词
+
+`intelligent_cabin` 后端分两层输出：
+
+#### 层 A：JointBertNLU 的原始输出（`joint_nlu.py`）
+
+```python
+@dataclass
+class JointNluResult:
+    intent_id: str | None       # 识别出的意图 id，如 "wirecut_set_speed"
+    intent_score: float          # softmax 后的概率，就是置信度，0~1
+    candidates: list[JointCandidate]  # Top-K 候选意图及其概率
+    slots: dict[str, Any]        # 从句子里提取的 slot，如 {"speed": 80}
+    slot_items: list[JointSlot]  # slot 在原文中的精确位置和得分
+```
+
+这里的 `intent_id + intent_score + slots` 对应 Canvas 描述里的 `intent + confidence + slot`。
+`domain` 不是模型直接输出的，而是根据 `intent_id` 在 `domain.yml` 里查到的（`wirecut_set_speed` → domain `machine_control`）。
+
+#### 层 B：Router / FusionGrader 的决策输出（`router.py`）
+
+```python
+# MultiStageIntentMatcher._build_fusion_stage() 里
+decision = "execute" | "clarify" | "route_to_cloud" | "reject"
+```
+
+这是在原始 NLU 结果基础上做的**二次路由判断**，加入了：
+- 置信度是否够高（`score ≥ execute_score_threshold=0.55`）
+- 头两名候选的分差是否足够大（`margin ≥ execute_margin_threshold=0.18`）
+- 是否有多义性（ambiguous）
+
+它告诉上层"这个识别结果能不能直接执行"，而不只是"模型认为是哪个意图"。
+
+---
+
+### 1.3 两套词汇的完整映射关系
+
+```
+Canvas 的描述            ←→    intelligent_cabin 的实际字段
+─────────────────────────────────────────────────────────────
+domain                    ←→    intent_def.domain  (从 domain.yml 查)
+intent                    ←→    JointNluResult.intent_id
+slot                      ←→    JointNluResult.slots  (dict)
+confidence                ←→    JointNluResult.intent_score  (0~1)
+
+（以上是 NLU 层的概念对应）
+
+Canvas 的路由逻辑          ←→    Router 层的 decision 字段
+"高置信 + 设备控制域"      ←→    decision="execute" AND domain="machine_control"
+"知识域/低置信兜底"        ←→    decision="route_to_cloud" 或 domain="equipment_knowledge"
+"smalltalk"               ←→    decision="reject" 或 social_router 处理
+```
+
+> **关键点**：Canvas 当前用 Mock NLU（`src/lib/nlu/mock.ts`），它直接输出 `domain + intent + confidence + routeHint`。
+> 接入真实 NLU 后，两个项目**原生打通，当成一个项目**，不做兼容适配层，接口可以随时改。
+
+---
+
+## 第二部分：Canvas ↔ NLU 服务的统一路由方案
+
+### 2.1 两个项目合并为一个项目（已确认）
+
+`intelligent_cabin` 不作为独立服务做适配，而是直接作为 `ai-canvas` 的后端子模块。
+原来 `src/lib/nlu/mock.ts` 的格式**可以废弃**，不需要保持向后兼容。
+
+### 2.2 真实 NLU 服务的 HTTP 响应
+
+调用 `POST /api/v1/agent/chat` 后，服务返回 `ChatResponse`，与路由相关的核心字段：
+
+```json
+{
+  "session_id": "xxx",
+  "intent": "wirecut_set_speed",
+  "domain": "machine_control",
+  "decision": "execute",
+  "status": "completed",
+  "filled_slots": { "speed": 80 },
+  "routing_debug": {
+    "confidence_grade": "high",
+    "stages": [
+      { "stage": "classifier", "score": 0.87, "candidates": [...] },
+      {
+        "stage": "fusion",
+        "metadata": { "decision": "execute", "grade": "high",
+                      "classifier_signal": 0.87, "classifier_margin": 0.34 }
+      }
+    ]
+  }
+}
+```
+
+> `domain` 字段需要在 intelligent_cabin 的 `ChatResponse` schema 里加上（从 `IntentDefinition.domain` 填充），改动极小。
+
+### 2.3 Canvas 侧的 NluResult 类型（替换 mock.ts）
+
+```typescript
+// src/lib/nlu/types.ts  （新建，替换 mock.ts 里的类型）
+
+export type RouteHint =
+  | "tool_call"       // decision=execute + machine_control 域
+  | "knowledge_query" // decision=route_to_cloud 或 equipment_knowledge 域
+  | "smalltalk"       // decision=reject
+  | "fallback";
+
+export type NluResult = {
+  modelVersion: string;
+  domain: string;                                        // 后端直接返回
+  intent: string;                                        // intent_id
+  confidence: number;                                    // classifier stage score
+  slots: Record<string, string | number | boolean>;      // filled_slots
+  routeHint: RouteHint;
+  decisionGrade: "high" | "medium" | "low";
+  rawDecision: "execute" | "clarify" | "route_to_cloud" | "reject";
+};
+
+export function mapDecisionToRouteHint(
+  decision: string,
+  domain: string
+): RouteHint {
+  if (decision === "execute") {
+    if (domain === "machine_control") return "tool_call";
+    if (domain === "equipment_knowledge") return "knowledge_query";
+    return "tool_call";
+  }
+  if (decision === "route_to_cloud") return "knowledge_query";
+  if (decision === "reject") return "smalltalk";
+  return "fallback"; // clarify 等待补槽，暂用 fallback
+}
+```
+
+### 2.4 confidence 读取位置
+
+`routing_debug.stages` 里找 `stage === "classifier"` 的记录，取其 `score` 字段。
+这是 BERT 分类器 softmax 后的原始概率，等价于 Canvas 描述里的 `confidence`。
+
+---
+
+## 第三部分：语音处理前置拦截链路（已定稿）
+
+### 3.1 设计原则
+
+- **所有当前可见 UI 组件的按钮文本都参与语音匹配**，命中则直接触发点击事件，不调 BERT
+- 调机流程（GuidedProcedure）当前**不在实现范围**，相关 1c 拦截逻辑暂不涉及
+- BERT 报错**直接抛出**，不降级，不用 Mock 兜底
+
+### 3.2 四阶段处理链路
+
+```
+ASR 文本输入
+    │
+    ▼
+[阶段 0] 停止词检测                          ← 静态词表，构建时嵌入
+    ├── 命中 cancel_words → 生成 stop_action，链路终止
+    └── 未命中 ↓
+    │
+    ▼
+[阶段 1] UI 可见元素语音点击匹配              ← 纯前端规则，<1ms
+    ├── 1a. session.status=waiting_confirmation 时的 affirm/deny（最高优先级）
+    └── 1b. 当前可见 Artifact 按钮 text 匹配
+    │   命中任意 → 生成 ActionEvent，走 Canvas 状态机，链路终止
+    │
+    │ 全部未命中
+    ▼
+[阶段 1.5] waiting_slot + inform 检测
+    ├── session.status=waiting_slot && 输入为数字/数值类
+    └── 命中 → 调 fill_slots 接口，链路终止
+    │
+    ▼
+[阶段 2] BERT NLU（intelligent_cabin /api/v1/agent/chat）
+    ├── 报错 → 直接抛出，不降级
+    ├── decision=execute → 工具调用层（DBus）→ Artifact
+    ├── decision=clarify → 渲染补槽卡，等待 waiting_slot
+    ├── decision=route_to_cloud → LLM + 知识库 → KnowledgeLessonArtifact
+    └── decision=reject → LLM 直接作答，不写 ArtifactStore
+```
+
+### 3.3 阶段 1 内部优先级说明
+
+```typescript
+// 优先级从高到低（1c 调机 textAliases 暂不实现）
+const PRIORITY_ORDER = [
+  "waiting_confirmation_affirm_deny",  // 1a
+  "visible_artifact_button",           // 1b
+];
+```
+
+**为什么 1a 最高**：当高风险操作（如"开始加工"）弹出确认卡时，
+操作员说"确认"应当触发确认动作，而不是响应画布上同时存在的其他按钮。
+状态（`session.status`）决定优先级，而非文本本身。
+
+### 3.4 阶段 1 匹配实现（pipeline.ts 骨架）
+
+```typescript
+// src/lib/nlu/pipeline.ts
+
+import { AFFIRM_WORDS, CANCEL_WORDS } from "./voice-aliases.gen"; // 构建时生成
+
+type ActionEvent = {
+  type: "voice_click_event" | "slot_fill_event" | "stop_action";
+  actionId?: string;
+  artifactId?: string;
+  sourceText: string;
+};
+
+export async function processVoiceInput(
+  asrText: string,
+  session: CanvasSession
+): Promise<NluResult | ActionEvent> {
+
+  // 阶段 0：停止词
+  const norm = normalizeVoice(asrText);
+  if (CANCEL_WORDS.some(w => norm.includes(w))) {
+    return { type: "stop_action", sourceText: asrText };
+  }
+
+  // 阶段 1a：waiting_confirmation 状态的 affirm/deny
+  if (session.status === "waiting_confirmation") {
+    if (AFFIRM_WORDS.some(w => norm.includes(w))) {
+      return { type: "voice_click_event", actionId: "confirm", sourceText: asrText };
+    }
+    if (CANCEL_WORDS.some(w => norm.includes(w))) {
+      return { type: "voice_click_event", actionId: "cancel", sourceText: asrText };
+    }
+  }
+
+  // 阶段 1b：当前 Artifact 按钮匹配
+  const voiceClick = matchVoiceToAction(asrText, session.visibleActions);
+  if (voiceClick) {
+    return {
+      type: "voice_click_event",
+      actionId: voiceClick.actionId,
+      artifactId: voiceClick.artifactId,
+      sourceText: asrText,
+    };
+  }
+
+  // 阶段 1.5：waiting_slot + 数值输入
+  if (session.status === "waiting_slot" && isNumericInput(asrText)) {
+    return { type: "slot_fill_event", sourceText: asrText };
+  }
+
+  // 阶段 2：BERT NLU（报错直接抛出）
+  const response = await callNluService(asrText, session.sessionId);
+  return adaptNluResponse(response);
+}
+```
+
+### 3.5 voice_aliases 配置（已确认：静态构建）
+
+**词表位置**：`intelligent_cabin/config/voice_aliases.yml`（和 `dialog_acts.yml` 放在一起）
+
+```yaml
+# voice_aliases.yml
+affirm_words: ["确认", "好的", "执行", "是", "对", "继续", "好", "ok"]
+cancel_words: ["取消", "算了", "不要", "不用", "停止", "停"]
+
+# 工控设备别名（按 intent_id 分组，用于阶段 1b 的 Artifact voiceActions）
+intent_aliases:
+  wirecut_start_run:  ["开始", "启动", "加工", "跑起来"]
+  wirecut_stop_run:   ["停", "停机", "急停", "停止"]
+  wirecut_home_all:   ["回零", "归零", "回原点"]
+  wirecut_pause_run:  ["暂停", "变频暂停"]
+```
+
+**构建时生成**：构建脚本读取 yml → 生成 `src/lib/nlu/voice-aliases.gen.ts`，
+TypeScript 侧直接 import，不需要运行时 HTTP 请求。
+
+---
+
+## 第四部分：下一步实现计划
+
+| 步骤 | 位置 | 内容 |
+|---|---|---|
+| 1 | `intelligent_cabin` Python 侧 | `ChatResponse` schema 加 `domain` 字段，`agent_service.py` 填充 |
+| 2 | `intelligent_cabin/config/` | 创建 `voice_aliases.yml`，补充工控别名 |
+| 3 | `src/lib/nlu/` | 新建 `types.ts`，废弃 `mock.ts` 中旧类型 |
+| 4 | `src/lib/nlu/` | 新建 `pipeline.ts`，实现四阶段处理链路 |
+| 5 | `src/lib/artifacts/types.ts` | 各 Artifact 类型上加 `voiceActions` 字段 |
+| 6 | 构建配置 | 添加 yml → ts 生成脚本（`voice-aliases.gen.ts`） |