From ea6cf23add36cc5fa4756b7baf3460d1ef3e8020 Mon Sep 17 00:00:00 2001 From: Scott Hanselman Date: Sun, 26 Apr 2026 22:26:56 -0700 Subject: [PATCH] docs: update Windows node parity audit --- README.md | 21 ++++--- docs/WINDOWS_NODE_ARCHITECTURE.md | 95 ++++++++++++++++--------------- docs/WINDOWS_NODE_TESTING.md | 28 +++++---- docs/gateway-node-integration.md | 27 ++++++--- 4 files changed, 99 insertions(+), 72 deletions(-) diff --git a/README.md b/README.md index 48a8171..8e9961d 100644 --- a/README.md +++ b/README.md @@ -203,17 +203,17 @@ When Node Mode is enabled in Settings, your Windows PC becomes a **node** that t "canvas.snapshot", "canvas.a2ui.push", "canvas.a2ui.reset", - "screen.snapshot", - "camera.list", - "camera.snap", - "camera.clip", - "location.get" + "screen.snapshot", + "camera.list", + "camera.snap", + "camera.clip", + "location.get" ] - } - } + } + } } ``` - > ⚠️ **Important**: The gateway has a server-side allowlist. Commands must be listed explicitly - wildcards like `canvas.*` don't work! Privacy-sensitive commands such as `screen.record` should only be added when you explicitly want to allow them. + > ⚠️ **Important**: The gateway has a server-side allowlist. Commands must be listed explicitly - wildcards like `canvas.*` don't work! Privacy-sensitive commands such as `screen.record` should only be added to `allowCommands` when you explicitly want to allow them. 5. **Test it** from your Mac/gateway: ```bash @@ -227,11 +227,14 @@ When Node Mode is enabled in Settings, your Windows PC becomes a **node** that t openclaw nodes canvas eval --node --javaScript "document.title" # Render A2UI JSONL in the canvas (pass the file contents as a string) - openclaw nodes canvas a2ui push --node --jsonl "$(Get-Content -Raw .\\ui.jsonl)" + openclaw nodes canvas a2ui push --node --jsonl "$(cat ./ui.jsonl)" # Take a screenshot openclaw nodes invoke --node --command screen.snapshot --params '{"screenIndex":0,"format":"png"}' + # Record a short screen clip (requires explicitly allowing screen.record on the gateway) + openclaw nodes screen record --node --duration 3000 --fps 10 --screen 0 --no-audio --out /tmp/openclaw-windows-screen-record-test.mp4 --json + # List cameras openclaw nodes invoke --node --command camera.list diff --git a/docs/WINDOWS_NODE_ARCHITECTURE.md b/docs/WINDOWS_NODE_ARCHITECTURE.md index f549f59..890ed9b 100644 --- a/docs/WINDOWS_NODE_ARCHITECTURE.md +++ b/docs/WINDOWS_NODE_ARCHITECTURE.md @@ -242,12 +242,13 @@ Niche scenario. If the "server" must be Windows for some reason, this works but ## Capability Matrix by Node Type -| Capability | macOS App | iOS App | Android App | WSL2 Headless | **Windows Tray (proposed)** | Windows API | +| Capability | macOS App | iOS App | Android App | WSL2 Headless | **Windows Tray** | Windows API | |-----------|-----------|---------|-------------|---------------|---------------------------|-------------| | `canvas.present` | ✅ SwiftUI WebView | ✅ WKWebView | ✅ WebView | ❌ | **✅ WebView2** | WebView2 | | `canvas.snapshot` | ✅ | ✅ | ✅ | ❌ | **✅** | WebView2 CapturePreviewAsync | | `canvas.eval` | ✅ | ✅ | ✅ | ❌ | **✅** | WebView2 ExecuteScriptAsync | -| `canvas.a2ui` | ✅ | ✅ | ✅ | ❌ | **⚠️ Investigating** | WebView2 | +| `canvas.a2ui.push/reset` | ✅ | ✅ | ✅ | ❌ | **✅** | WebView2 | +| `canvas.a2ui.pushJSONL` | ✅ | ✅ | ✅ | ❌ | ❌ | Legacy alias not yet implemented | | `camera.snap` | ✅ AVFoundation | ✅ AVFoundation | ✅ CameraX | ❌ | **✅** | MediaCapture + frame reader fallback | | `camera.clip` | ✅ | ✅ | ✅ | ❌ | **✅** | MediaCapture + MediaEncoding | | `camera.list` | ✅ | ✅ | ✅ | ❌ | **✅** | DeviceInformation.FindAllAsync | @@ -255,7 +256,7 @@ Niche scenario. If the "server" must be Windows for some reason, this works but | `system.run` | ✅ | ❌ | ❌ | ✅ | **✅** | Process.Start (cmd/pwsh) + ExecApprovalPolicy | | `system.execApprovals` | ❌ | ❌ | ❌ | ❌ | **✅** | JSON policy file (exec-policy.json) | | `system.notify` | ✅ NSUserNotification | ✅ UNUserNotification | ✅ NotificationManager | ❌ | **✅** | ToastNotificationManager | -| `location.get` | ✅ CLLocationManager | ✅ CLLocationManager | ✅ FusedLocation | ❌ | **⚠️** | Windows.Devices.Geolocation | +| `location.get` | ✅ CLLocationManager | ✅ CLLocationManager | ✅ FusedLocation | ❌ | **✅** | Windows.Devices.Geolocation | | `sms.send` | ❌ | ❌ | ✅ | ❌ | ❌ | N/A | | Browser proxy | ✅ | ❌ | ❌ | ✅ Playwright | **⚠️ Future** | Playwright on Windows | | Accessibility | ✅ AX API | ❌ | ❌ | ❌ | **⚠️ Future** | UI Automation | @@ -270,7 +271,7 @@ For contributors: here's what implementing a Windows node means at the protocol ### 1. Connect as a node -The tray app's `OpenClawGatewayClient` currently connects as an **operator**. To become a node, it needs to send (or send an additional) `connect` with `role: "node"`: +The tray app uses a dedicated node connection (`WindowsNodeClient`) with `role: "node"`: ```json { @@ -294,8 +295,9 @@ The tray app's `OpenClawGatewayClient` currently connects as an **operator**. To "canvas.eval", "canvas.snapshot", "canvas.a2ui.push", "canvas.a2ui.reset", "camera.list", "camera.snap", "camera.clip", - "screen.record", - "system.run", "system.notify", + "screen.snapshot", "screen.record", + "location.get", + "system.run", "system.run.prepare", "system.which", "system.notify", "system.execApprovals.get", "system.execApprovals.set" ], "permissions": { @@ -564,7 +566,7 @@ The node protocol requires a stable device identity (`device.id`) derived from a - [x] `system.notify` — agent can request Windows toast notifications - [x] `canvas.present` / `canvas.hide` — floating WebView2 canvas window - [x] `canvas.navigate` / `canvas.eval` / `canvas.snapshot` — full canvas support -- [ ] `canvas.a2ui.push` / `canvas.a2ui.reset` — A2UI rendering (investigating: agent tool policy blocks) +- [x] `canvas.a2ui.push` / `canvas.a2ui.reset` — A2UI rendering - [x] `system.run` — exec commands on Windows (PowerShell/cmd) with ICommandRunner abstraction - [x] `system.execApprovals.get/set` — remote-manageable exec approval policy - [ ] Settings UI for node capabilities (enable/disable camera, screen, etc.) @@ -577,7 +579,7 @@ The node protocol requires a stable device identity (`device.id`) derived from a - [x] `camera.list` — enumerate Windows cameras (DeviceInformation.FindAllAsync) - [x] `camera.snap` — capture photo from webcam (MediaCapture + frame reader fallback) -- [ ] `camera.clip` — record short video clip (MediaCapture + MediaEncoding) +- [x] `camera.clip` — record short video clip (MediaCapture + MediaEncoding) - [x] `screen.record` — capture Windows desktop via Graphics Capture API - [x] `screen.snapshot` — screenshot via Windows.Graphics.Capture - [x] Permission prompts (camera: UnauthorizedAccessException → toast; future MSIX consent) @@ -597,7 +599,7 @@ The node protocol requires a stable device identity (`device.id`) derived from a ### Phase 4: Feature Parity + Polish **Priority: LOW | Effort: Medium | Impact: Medium** -- [ ] `location.get` — Windows Location API +- [x] `location.get` — Windows Location API - [ ] TTS / Speech Synthesis - [ ] Microphone / voice input - [ ] Browser proxy (Playwright on Windows, launched by tray app) @@ -613,25 +615,27 @@ The node protocol requires a stable device identity (`device.id`) derived from a ``` OpenClaw.Shared/ -├── OpenClawGatewayClient.cs ← existing operator client -├── OpenClawNodeClient.cs ← NEW: node protocol handler -├── INodeCommandHandler.cs ← NEW: interface for command dispatch -├── NodeIdentity.cs ← NEW: keypair + device ID -└── Models/ - ├── NodeConnectParams.cs ← NEW - ├── NodeInvokeRequest.cs ← NEW - └── NodeInvokeResponse.cs ← NEW +├── OpenClawGatewayClient.cs ← operator client +├── WindowsNodeClient.cs ← node protocol handler +├── DeviceIdentity.cs ← Ed25519 keypair + device token +├── NodeCapabilities.cs ← command/capability interfaces +└── Capabilities/ + ├── CanvasCapability.cs + ├── CameraCapability.cs + ├── ScreenCapability.cs + ├── LocationCapability.cs + └── SystemCapability.cs -OpenClaw.Tray/ +OpenClaw.Tray.WinUI/ ├── Services/ -│ ├── NodeService.cs ← NEW: orchestrates node connection -│ ├── CanvasService.cs ← NEW: handles canvas.* commands -│ ├── CameraService.cs ← NEW: handles camera.* commands -│ ├── ScreenService.cs ← NEW: handles screen.* commands -│ ├── SystemService.cs ← NEW: handles system.* commands -│ └── ExecApprovals.cs ← NEW: local approval store +│ ├── NodeService.cs ← orchestrates node connection +│ ├── CameraCaptureService.cs +│ ├── ScreenCaptureService.cs +│ ├── ScreenRecordingService.cs +│ ├── LocalCommandRunner.cs +│ └── SettingsManager.cs ├── Windows/ -│ ├── CanvasWindow.xaml ← NEW: floating WebView2 canvas +│ ├── CanvasWindow.xaml ← floating WebView2 canvas │ └── CanvasWindow.xaml.cs ``` @@ -648,12 +652,15 @@ Tray App Start │ └─ (existing functionality) │ └─ Connect WS #2: role=node - ├─ Advertise caps: [canvas, camera, screen, system, notifications] - ├─ Advertise commands: [canvas.*, camera.*, screen.*, system.*] + ├─ Advertise caps: [canvas, camera, location, screen, system] + ├─ Advertise commands: [canvas.*, camera.*, location.get, screen.*, system.*] ├─ Handle node.invoke requests │ ├─ canvas.present → show/navigate CanvasWindow │ ├─ canvas.snapshot → WebView2 CapturePreview │ ├─ camera.snap → MediaCapture → JPEG → base64 + │ ├─ camera.clip → MediaCapture → MP4 → base64 + │ ├─ location.get → Windows.Devices.Geolocation + │ ├─ screen.snapshot → GraphicsCapture → image base64 │ ├─ screen.record → GraphicsCapture → MP4 → base64 │ ├─ system.run → Process.Start → stdout/stderr │ └─ system.notify → ToastNotification @@ -668,29 +675,29 @@ This is a big effort and **contributions are very welcome!** Here's how to get s ### Good First Issues -1. **Device identity module** — Generate Ed25519 keypair, store in `%APPDATA%`, derive fingerprint. Pure crypto, well-defined scope. -2. **`system.notify` handler** — Accept title + body + priority, show a Windows toast. The tray app already shows toasts — this just adds the node protocol wrapper. -3. **`system.run` handler** — Execute a command via `Process.Start`, return stdout/stderr/exit code. Add exec approvals. +1. **`canvas.a2ui.pushJSONL` alias** — Route the legacy Mac-compatible alias through the existing A2UI push path. +2. **Device status command** — Add `device.info` / `device.status` with OS, version, host, and basic availability details. +3. **Capability diagnostics copy** — Add a copyable summary that explains declared commands, gateway allowlist status, and dangerous-command opt-ins. ### Medium Issues -4. **Node protocol client** (`OpenClawNodeClient`) — WebSocket connect with `role: "node"`, handle `node.invoke` dispatch. Builds on the existing `OpenClawGatewayClient`. -5. **Canvas floating window** — WebView2 in a borderless/floating window that appears on `canvas.present` and hides on `canvas.hide`. Related: #5. +4. **Browser proxy parity** — Investigate a safe Windows implementation for Mac-compatible `browser.proxy`. +5. **Gateway/channel flyout** — Show configured/running/error/probe state for channels and gateway health in the tray. ### Harder Issues -6. **Camera capture** — `Windows.Media.Capture` for photos and video clips. Handle permissions, multiple cameras, front/back mapping. -7. **Screen recording** — `Windows.Graphics.Capture` for screen recording. Handle multi-monitor, permission consent, encoding to MP4. -8. **Native Windows gateway audit** — Run `openclaw gateway` on Windows, identify and fix platform-specific failures. +6. **Voice mode parity** — Review the open Windows Voice Mode PR against the current Mac voice runtime/controller/session split. +7. **Native Windows gateway audit** — Run `openclaw gateway` on Windows, identify and fix platform-specific failures. +8. **Richer channel operations** — Add tray surfaces for channel configuration, probe status, token source, last error, and recovery actions. ### Development Setup -See #7 / #8 for DEVELOPMENT.md. Quick start: -```bash +See `DEVELOPMENT.md`. Quick start: +```powershell git clone https://github.com/shanselman/openclaw-windows-hub.git cd openclaw-windows-hub -dotnet build -dotnet run --project src/OpenClaw.Tray +.\build.ps1 +dotnet run --project src\OpenClaw.Tray.WinUI\OpenClaw.Tray.WinUI.csproj ``` Requires .NET 10.0 SDK, Windows 10/11. For testing node protocol, you'll need a running OpenClaw gateway (in WSL2 or on another machine). @@ -699,12 +706,10 @@ Requires .NET 10.0 SDK, Windows 10/11. For testing node protocol, you'll need a ## Open Questions -- [ ] Does the gateway protocol support dual-role connections, or must we open two WebSockets? -- [ ] What's the minimum `PROTOCOL_VERSION` the node connect needs? (Currently 3) -- [ ] Should exec from a Windows node default to PowerShell or cmd.exe? -- [ ] How should the tray app handle "node in background" — Windows can suspend tray apps. Do we need a background service? -- [ ] Can the Graphics Capture API work without a visible window / user picker? (Background capture requires Windows 11+) -- [ ] Should we pursue MSIX packaging for the tray app to unlock restricted capabilities? +- [ ] Should Windows implement `device.info` / `device.status` before or after browser proxy parity? +- [ ] Should dangerous command opt-ins be shown in the tray as a guided repair flow, a docs link, or both? +- [ ] How much channel management should live in the native tray versus opening the web dashboard? +- [ ] Should Voice Mode land as a separate parity track after the open PR is reviewed against current Mac architecture? --- diff --git a/docs/WINDOWS_NODE_TESTING.md b/docs/WINDOWS_NODE_TESTING.md index 59ecdce..db311d7 100644 --- a/docs/WINDOWS_NODE_TESTING.md +++ b/docs/WINDOWS_NODE_TESTING.md @@ -2,7 +2,7 @@ ## Overview -The Windows Node feature allows the tray app to receive commands from the OpenClaw agent (canvas, screenshots, notifications). This is **experimental** and must be explicitly enabled in Settings. +The Windows Node feature allows the tray app to receive commands from the OpenClaw agent (canvas, screenshots, screen recordings, camera, location, notifications, and controlled command execution). This is **experimental** and must be explicitly enabled in Settings. ## How to Enable @@ -25,8 +25,8 @@ The Windows Node feature allows the tray app to receive commands from the OpenCl ``` [INFO] Starting Windows Node connection to ws://... [INFO] Node connected, waiting for challenge... - [INFO] Sent node registration with X capabilities, Y commands - [INFO] Node registered successfully! + [INFO] Registered capability: screen (2 commands) + [INFO] All capabilities registered [INFO] Node status: Connected ``` @@ -45,22 +45,28 @@ These features need the gateway to send `node.invoke` commands: | `canvas.eval` | Execute JavaScript | Runs JS in canvas, returns result | | `canvas.snapshot` | Capture canvas | Returns base64 PNG of canvas content | | `screen.snapshot` | Take screenshot | Captures screen, shows notification, returns base64 | +| `screen.record` | Record short screen clip | Returns MP4/base64 metadata; requires explicit gateway allowlist | | `system.notify` | Show notification | Displays toast notification | +| `system.run` / `system.which` | Controlled command execution | Uses local exec approval policy | | `camera.list` | Enumerate cameras | Returns device IDs and names | | `camera.snap` | Capture photo | Returns base64 image (NV12 fallback) | +| `camera.clip` | Capture video clip | Returns MP4/base64 metadata | +| `location.get` | Get Windows location | Uses Windows location permission/settings | ## Capabilities Advertised When the node connects, it advertises these capabilities: - `canvas` - WebView2-based canvas window -- `screen` - Screen capture via GDI +- `screen` - Screen snapshot and recording via Windows.Graphics.Capture - `system` - Notifications, command execution (`system.run`, `system.run.prepare`, `system.which`), exec approval policy -- `camera` - MediaCapture photo capture (frame reader fallback) +- `camera` - MediaCapture photo/video capture (frame reader fallback) +- `location` - Windows.Devices.Geolocation ## Security Features - **URL Validation**: Canvas blocks `file://`, `javascript:`, localhost, private IPs, IPv6 localhost -- **Screen Capture Notification**: User is notified when screen is captured +- **Screen Capture Notification**: User is notified when screen snapshots are captured +- **Screen Recording Allowlist**: `screen.record` must be explicitly allowed by the gateway and does not leave a hidden local MP4 copy on Windows - **Node Mode Toggle**: Must be explicitly enabled by user - **Command Validation**: Only alphanumeric commands with dots/hyphens allowed @@ -92,10 +98,10 @@ When the node connects, it advertises these capabilities: - `system.execApprovals` allowlist flow 2. ~~**screen.record**~~ ✅ Implemented - Graphics Capture video recording (MP4/base64) -3. **camera.clip** +3. ~~**camera.clip**~~ ✅ Implemented - Short webcam video capture (MediaCapture + encoding) -4. **A2UI end-to-end** - - Resolve tool policy/allowlist and validate JSONL rendering +4. **A2UI pushJSONL alias** + - Windows supports `canvas.a2ui.push` and `canvas.a2ui.reset`; Mac also supports legacy `canvas.a2ui.pushJSONL` 5. **Packaging & consent prompts** - MSIX packaging with camera/screen capabilities for system prompts 6. **Test matrix & polish** @@ -107,5 +113,7 @@ When the node connects, it advertises these capabilities: - `src/OpenClaw.Shared/WindowsNodeClient.cs` - Node protocol client - `src/OpenClaw.Shared/Capabilities/*.cs` - Capability handlers - `src/OpenClaw.Tray.WinUI/Services/NodeService.cs` - Orchestrates capabilities -- `src/OpenClaw.Tray.WinUI/Services/ScreenCaptureService.cs` - GDI screen capture +- `src/OpenClaw.Tray.WinUI/Services/ScreenCaptureService.cs` - screen snapshots +- `src/OpenClaw.Tray.WinUI/Services/ScreenRecordingService.cs` - screen recordings +- `src/OpenClaw.Tray.WinUI/Services/CameraCaptureService.cs` - camera photo/video capture - `src/OpenClaw.Tray.WinUI/Windows/CanvasWindow.xaml` - WebView2 canvas diff --git a/docs/gateway-node-integration.md b/docs/gateway-node-integration.md index ebb420f..12f9ba5 100644 --- a/docs/gateway-node-integration.md +++ b/docs/gateway-node-integration.md @@ -1,6 +1,6 @@ # OpenClaw Gateway ↔ Windows Node Integration Guide -> Last updated: 2026-04-25 +> Last updated: 2026-04-26 > Source of truth: [`openclaw/openclaw` — `src/gateway/node-command-policy.ts`](https://github.com/openclaw/openclaw/blob/main/src/gateway/node-command-policy.ts) This document captures everything we've learned about how the OpenClaw gateway handles node commands, platform allowlists, and the QR bootstrap pairing flow. It exists because these details are not obvious from the docs alone and caused real debugging sessions. @@ -138,7 +138,13 @@ Our node previously registered `screen.list`. This command does not exist in the **Fixed locally**: `screen.list` is no longer advertised. -### 2.3 Verified Correct Names +### 2.3 `screen.record.start` / `screen.record.stop` — Not Mac/Gateway Commands + +PR #159 originally explored session-based start/stop recording commands, but the current Mac node and gateway command surface only define fixed-duration `screen.record`. + +**Fixed locally**: Windows now implements only fixed-duration `screen.record`; `screen.record.start` and `screen.record.stop` are intentionally not advertised. + +### 2.4 Verified Correct Names | Our Command | Gateway Canonical | Status | |-------------|-------------------|--------| @@ -160,15 +166,17 @@ Our node previously registered `screen.list`. This command does not exist in the | `canvas.a2ui.reset` | `canvas.a2ui.reset` | ✅ Match | | `screen.record` | `screen.record` | ✅ Match (dangerous) | -### 2.4 Commands We're Missing vs macOS +### 2.5 Remaining Command Gaps vs Current Mac Node | Command | macOS | Windows | Notes | |---------|-------|---------|-------| -| `canvas.a2ui.pushJSONL` | ✅ (in gateway allowlist) | ❌ | Not widely used | -| `device.info` | ✅ | ❌ | Hardware/OS info | -| `device.status` | ✅ | ❌ | Battery/charging status | +| `canvas.a2ui.pushJSONL` | ✅ | ❌ | Legacy alias for `canvas.a2ui.push`; easy parity follow-up | | `browser.proxy` | ✅ | ❌ | Chrome DevTools proxy | +### 2.6 Safe Gateway-Policy Gaps to Consider + +The gateway's macOS/iOS default allowlists include safe device-info commands (`device.info`, `device.status`) and other mobile-oriented commands. Windows does not currently implement those. They are good future parity candidates, but they are separate from the current Mac runtime's core canvas/camera/location/screen/system/browser command set. + --- ## 3. Platform Detection @@ -333,7 +341,8 @@ Until the gateway expands Windows safe defaults, the practical local solution is - [x] Rename `screen.capture` → `screen.snapshot` in `ScreenCapability.cs` - [x] Remove `screen.list` from declared commands -- [ ] Remove debug logging from `WindowsNodeClient.cs` (done) +- [x] Remove debug logging from `WindowsNodeClient.cs` +- [x] Add Mac-compatible fixed-duration `screen.record`; do not add `screen.list` or record start/stop commands ### 5.2 Setup Wizard Improvements @@ -347,7 +356,7 @@ Until the gateway expands Windows safe defaults, the practical local solution is - [ ] **Request Windows/macOS parity for safe declared commands** — Windows should allow the same safe companion commands macOS does, while dangerous commands stay explicit opt-in. - [ ] **Document `gateway.nodes.allowCommands`** — it's not in the config reference page -- [ ] **Consider `canvas.a2ui.pushJSONL`** — it's in the gateway allowlist but we don't implement it +- [ ] **Consider `canvas.a2ui.pushJSONL`** — current Mac supports it as a legacy JSONL alias; Windows implements `canvas.a2ui.push` and `canvas.a2ui.reset` #### Upstream issue draft @@ -398,6 +407,8 @@ When shipping the Windows node, README/wiki should tell users: > openclaw gateway restart > ``` > Then re-pair the node (`openclaw devices reject ` + re-approve). +> +> Add `screen.record` only when you explicitly want to allow privacy-sensitive screen recording. ---