Organize docs and add Windows Node architecture roadmap

- Move CODE_REVIEW.md, TEST_COVERAGE.md, XAML_COMPILER_BUG.md to docs/
- Add WINDOWS_NODE_ARCHITECTURE.md - comprehensive plan for evolving
  tray app into a first-class Windows node with canvas, camera, screen
  capture, and more
This commit is contained in:
Scott Hanselman 2026-02-01 21:26:10 -08:00
parent e7068351a3
commit c658b1f517
4 changed files with 708 additions and 0 deletions

View File

@ -0,0 +1,708 @@
# 🏗️ Architecture: Windows Platform Strategy & Native Node Roadmap
## Summary
OpenClaw has **excellent** macOS support — the native menubar app runs as a full node with camera, canvas, screen capture, notifications, location, system exec, and more. Windows users today rely on **WSL2** for the gateway and get a limited experience: no native UI integration, no camera, no canvas surface, and NAT networking quirks.
This issue proposes a comprehensive Windows platform strategy that evolves `OpenClaw.Tray` from a gateway *client* into a **native Windows node** — giving the agent eyes, hands, and a voice on Windows, and eventually exploring a fully native Windows gateway.
**This is the umbrella issue for the Windows platform story.** It maps every deployment scenario, identifies capability gaps, proposes a phased roadmap, and provides enough technical detail for contributors to pick up work items.
Related issues: #5 (Canvas Panel), #6 (Skills Settings UI), #7 (DEVELOPMENT.md), #9 (WebView2 ARM64)
---
## Table of Contents
- [Current State](#current-state)
- [The Vision](#the-vision)
- [Deployment Scenario Matrix](#deployment-scenario-matrix)
- [Capability Matrix by Node Type](#capability-matrix-by-node-type)
- [Node Protocol Overview](#node-protocol-overview)
- [Windows API Mapping](#windows-api-mapping)
- [Architectural Questions](#architectural-questions)
- [Phased Roadmap](#phased-roadmap)
- [Technical Deep Dives](#technical-deep-dives)
- [Contributing](#contributing)
---
## Current State
### What exists today
| Component | Status | Details |
|-----------|--------|---------|
| `OpenClaw.Shared` | ✅ Working | Gateway WebSocket client library (.NET) |
| `OpenClaw.Tray` | ✅ Working | System tray app — status, Quick Send, WebChat (WebView2), toast notifications, channel control |
| `OpenClaw.CommandPalette` | ✅ Working | PowerToys extension for quick commands |
| Windows Node | ❌ Missing | Tray app is a *client/operator*, not a *node* |
| Windows Gateway | ❌ Unexplored | Gateway runs in WSL2 only |
### How Scott uses it today
```
┌─────────────────────────────────────────────────┐
│ Mac mini (gateway host) │
│ ┌───────────────────────────────────────────┐ │
│ │ openclaw gateway (ws://127.0.0.1:18789) │ │
│ │ macOS native node (camera, canvas, screen) │ │
│ └───────────────────────────────────────────┘ │
└───────────────────────┬─────────────────────────┘
│ Tailnet / LAN
┌───────────────────────┴─────────────────────────┐
│ Windows PC │
│ ┌────────────────────┐ ┌────────────────────┐ │
│ │ WSL2 (Ubuntu) │ │ OpenClaw.Tray │ │
│ │ openclaw node run │ │ (WS operator only) │ │
│ │ headless: exec only│ │ Quick Send, Chat │ │
│ └────────────────────┘ └────────────────────┘ │
└─────────────────────────────────────────────────┘
```
The Windows PC has **two connections** to the Mac gateway: a headless WSL2 node (exec-only) and the tray app (operator client). But the agent **cannot**:
- Show a canvas on Windows
- Take screenshots of the Windows desktop
- Capture from a Windows webcam
- Send native Windows notifications (from the agent, vs. from the tray app's event listener)
- Get the Windows machine's location
---
## The Vision
```
┌──────────────────────────────────────────────────────┐
│ Gateway Host (Mac, Linux, WSL2, or Windows native) │
│ openclaw gateway (ws://...) │
└─────────────┬────────────────────────────────────────┘
┌─────────┼──────────┬──────────────┬──────────────┐
│ │ │ │ │
┌─┴──┐ ┌──┴───┐ ┌───┴────┐ ┌─────┴─────┐ ┌────┴────┐
│ Mac│ │iPhone│ │Android │ │ Windows │ │ Linux │
│Node│ │ Node │ │ Node │ │ Node │ │ Node │
│ ★★★│ │ ★★ │ │ ★★★ │ │ ★★★★ │ │ ★ │
│ │ │ │ │ │ │(Tray App) │ │(headless│
└────┘ └──────┘ └────────┘ └───────────┘ └─────────┘
Legend: ★ = capability breadth (more = richer)
```
The tray app becomes **a first-class OpenClaw node** that registers with `role: "node"` and advertises capabilities using Windows-native APIs. No WSL2 required for the node — only potentially for the gateway (or not at all if we pursue native Windows gateway).
---
## Deployment Scenario Matrix
### Scenario 1: Mac Only ⭐⭐⭐⭐⭐
| Aspect | Details |
|--------|---------|
| **Gateway** | macOS native (Node.js) |
| **Nodes** | macOS native app (full capabilities) |
| **Capabilities** | Camera ✅ Canvas ✅ Screen ✅ Notifications ✅ Browser ✅ Exec ✅ Location ✅ Audio/TTS ✅ Accessibility ✅ AppleScript ✅ |
| **Networking** | Loopback, zero config |
| **Setup complexity** | `openclaw onboard --install-daemon` → done |
| **UX Rating** | ⭐⭐⭐⭐⭐ Best possible experience |
The gold standard. Everything works out of the box. This is what Windows should feel like.
---
### Scenario 2: Windows Only — WSL2 Gateway + WSL2 Node ⭐⭐
| Aspect | Details |
|--------|---------|
| **Gateway** | WSL2 (Ubuntu) |
| **Nodes** | WSL2 headless node (exec only) |
| **Capabilities** | Camera ❌ Canvas ❌ Screen ❌ Notifications ❌ Browser Proxy ✅ Exec ✅ Location ❌ Audio/TTS ❌ |
| **Networking** | WSL2 NAT — `localhost` works but external access needs `--bind` + firewall rules. HTTPS can be tricky with self-signed certs. |
| **Setup complexity** | Install WSL2 → install Node.js → install openclaw → configure networking → hope NAT cooperates |
| **UX Rating** | ⭐⭐ Functional but headless. The agent is blind. |
**Pain points:**
- WSL2's NAT means `127.0.0.1` inside WSL ≠ `127.0.0.1` on Windows
- No way to interact with the Windows desktop
- Browser proxy works but can't see what the user sees
- Every WSL2 restart may change the internal IP
---
### Scenario 3: Windows Only — WSL2 Gateway + Tray App as Client ⭐⭐⭐
| Aspect | Details |
|--------|---------|
| **Gateway** | WSL2 (Ubuntu) |
| **Nodes** | None registered as node — tray app is operator-only |
| **Capabilities** | Camera ❌ Canvas ❌ (WebChat only) Screen ❌ Notifications ⚠️ (tray-side only, not agent-driven) Browser ❌ Exec ✅ (WSL2) Location ❌ Audio/TTS ❌ |
| **Networking** | WSL2 → Windows: `localhost:18789` usually works. Windows → WSL2: same. But HTTPS cert validation can fail for WebView2 connecting to WSL2's self-signed cert. |
| **Setup complexity** | Medium — WSL2 + openclaw + configure tray app to point at `ws://localhost:18789` |
| **UX Rating** | ⭐⭐⭐ Nice UI wrapper but agent still can't see or interact with Windows |
This is what the tray app provides *today*. Quick Send, embedded WebChat, status display. But it's a viewport into the agent, not a bridge for the agent to interact with Windows.
---
### Scenario 4: Windows Only — WSL2 Gateway + Tray App as Native Node ⭐⭐⭐⭐
| Aspect | Details |
|--------|---------|
| **Gateway** | WSL2 (Ubuntu) |
| **Nodes** | OpenClaw.Tray registers as `role: "node"` from Windows |
| **Capabilities** | Camera ✅ (MediaCapture API) Canvas ✅ (WebView2) Screen ✅ (Graphics Capture) Notifications ✅ (Toast + agent-driven) Browser ❌ (WSL2 browser proxy) Exec ✅ (WSL2 + optionally Windows `cmd`/`powershell`) Location ⚠️ (Windows Location API — desktop, less useful) Audio/TTS ✅ (Windows Speech) |
| **Networking** | WSL2 NAT still involved for gateway, but tray app connects outward to WSL2's WS — simpler direction. |
| **Setup complexity** | Medium — WSL2 gateway + tray app auto-discovers and pairs |
| **UX Rating** | ⭐⭐⭐⭐ Agent can now see and interact with Windows! |
**This is the sweet spot for Phase 1.** The gateway stays in WSL2 (proven, works), but the tray app lights up all the Windows-native capabilities. The agent gains eyes and hands on Windows.
---
### Scenario 5: Windows Native Gateway + Tray App as Node ⭐⭐⭐⭐⭐
| Aspect | Details |
|--------|---------|
| **Gateway** | Windows native (Node.js on Windows — `node.exe`) |
| **Nodes** | OpenClaw.Tray as full Windows node |
| **Capabilities** | Camera ✅ Canvas ✅ Screen ✅ Notifications ✅ Browser ✅ (Playwright on Windows) Exec ✅ (native `cmd.exe`, PowerShell, `wsl.exe`) Location ⚠️ Audio/TTS ✅ |
| **Networking** | `ws://127.0.0.1:18789` — pure loopback, no NAT, no WSL2 networking issues |
| **Setup complexity** | Low — `npm install -g openclaw && openclaw onboard` from PowerShell. Same as Mac. |
| **UX Rating** | ⭐⭐⭐⭐⭐ True feature parity with Mac |
**The dream.** No WSL2 dependency at all. The gateway runs natively on Windows (Node.js works fine on Windows), and the tray app provides all native capabilities. This is the Mac experience, on Windows.
**Key question:** Does the OpenClaw gateway actually *work* on Windows? It's Node.js, so *in theory* yes. But there may be Unix-specific assumptions (signals, file paths, spawning, etc.) that need auditing. See [Architectural Questions](#architectural-questions).
---
### Scenario 6: Mac Gateway + Windows WSL2 Node (Current Multi-Machine) ⭐⭐⭐⭐
| Aspect | Details |
|--------|---------|
| **Gateway** | macOS (local Mac) |
| **Nodes** | macOS native + WSL2 headless node on Windows |
| **Capabilities** | Full Mac capabilities + Windows exec via WSL2 node |
| **Networking** | Tailnet or SSH tunnel between machines. Reliable but requires network setup. |
| **Setup complexity** | Medium — two machines, tailnet/SSH, node pairing |
| **UX Rating** | ⭐⭐⭐⭐ Great for multi-machine setups where Mac is primary |
**Today's power-user setup.** Works well for "Mac as brain, Windows as build server" use cases. Adding tray-app-as-node would make this ⭐⭐⭐⭐⭐.
---
### Scenario 7: Mac Gateway + Tray App as Windows Node ⭐⭐⭐⭐⭐ (with Node)
| Aspect | Details |
|--------|---------|
| **Gateway** | macOS |
| **Nodes** | macOS native + Windows native (tray app) |
| **Capabilities** | Everything from Mac + camera, canvas, screen, notifications on Windows |
| **Networking** | Tailnet/LAN between Mac gateway and Windows tray app |
| **Setup complexity** | Medium — network between machines, but tray app handles pairing |
| **UX Rating** | ⭐⭐⭐⭐⭐ Best of both worlds for multi-machine |
The agent can see both the Mac and Windows desktops, capture from either machine's camera, show canvas on both screens. Multi-machine nirvana.
---
### Scenario 8: WSL2 Gateway + Mac Node ⭐⭐⭐½
| Aspect | Details |
|--------|---------|
| **Gateway** | WSL2 on Windows |
| **Nodes** | macOS native app connecting to Windows WSL2 gateway |
| **Capabilities** | Full Mac node capabilities, but gateway is in WSL2 |
| **Networking** | WSL2 must bind non-loopback (`--bind 0.0.0.0` or tailnet). Mac connects to Windows IP. |
| **Setup complexity** | High — WSL2 networking config + cross-machine pairing |
| **UX Rating** | ⭐⭐⭐½ Unusual topology but works. Why not put gateway on Mac? |
Niche scenario. If the "server" must be Windows for some reason, this works but Mac-gateway-with-Windows-node is almost always better.
---
### Summary Table
| # | Scenario | Gateway | Node(s) | Capabilities | Complexity | Rating |
|---|----------|---------|---------|-------------|------------|--------|
| 1 | Mac only | macOS | macOS app | Full | Low | ⭐⭐⭐⭐⭐ |
| 2 | Win WSL2 only | WSL2 | WSL2 headless | Exec only | High | ⭐⭐ |
| 3 | Win WSL2 + tray client | WSL2 | None (operator) | Exec + UI | Medium | ⭐⭐⭐ |
| 4 | **Win WSL2 + tray node** | WSL2 | **Tray app (node)** | **Most** | **Medium** | **⭐⭐⭐⭐** |
| 5 | **Win native gateway + tray node** | **Windows** | **Tray app (node)** | **Full** | **Low** | **⭐⭐⭐⭐⭐** |
| 6 | Mac gw + WSL2 node | macOS | macOS + WSL2 | Mac full + Win exec | Medium | ⭐⭐⭐⭐ |
| 7 | **Mac gw + tray node** | macOS | macOS + **Tray app** | **Full both** | Medium | **⭐⭐⭐⭐⭐** |
| 8 | WSL2 gw + Mac node | WSL2 | macOS app | Mac full | High | ⭐⭐⭐½ |
**Bold = new scenarios this issue enables.**
---
## Capability Matrix by Node Type
| Capability | macOS App | iOS App | Android App | WSL2 Headless | **Windows Tray (proposed)** | Windows API |
|-----------|-----------|---------|-------------|---------------|---------------------------|-------------|
| `canvas.present` | ✅ SwiftUI WebView | ✅ WKWebView | ✅ WebView | ❌ | **✅ WebView2** | WebView2 |
| `canvas.snapshot` | ✅ | ✅ | ✅ | ❌ | **✅** | WebView2 CapturePreviewAsync |
| `canvas.eval` | ✅ | ✅ | ✅ | ❌ | **✅** | WebView2 ExecuteScriptAsync |
| `canvas.a2ui` | ✅ | ✅ | ✅ | ❌ | **✅** | WebView2 |
| `camera.snap` | ✅ AVFoundation | ✅ AVFoundation | ✅ CameraX | ❌ | **✅** | Windows.Media.Capture |
| `camera.clip` | ✅ | ✅ | ✅ | ❌ | **✅** | MediaCapture + MediaEncoding |
| `camera.list` | ✅ | ✅ | ✅ | ❌ | **✅** | DeviceInformation.FindAllAsync |
| `screen.record` | ✅ CGWindowListCreateImage | ✅ ReplayKit | ✅ MediaProjection | ❌ | **✅** | Windows.Graphics.Capture |
| `system.run` | ✅ | ❌ | ❌ | ✅ | **✅** | Process.Start (cmd/pwsh) |
| `system.notify` | ✅ NSUserNotification | ✅ UNUserNotification | ✅ NotificationManager | ❌ | **✅** | ToastNotificationManager |
| `location.get` | ✅ CLLocationManager | ✅ CLLocationManager | ✅ FusedLocation | ❌ | **⚠️** | Windows.Devices.Geolocation |
| `sms.send` | ❌ | ❌ | ✅ | ❌ | ❌ | N/A |
| Browser proxy | ✅ | ❌ | ❌ | ✅ Playwright | **⚠️ Future** | Playwright on Windows |
| Accessibility | ✅ AX API | ❌ | ❌ | ❌ | **⚠️ Future** | UI Automation |
| Speech/TTS | ✅ NSSpeechSynthesizer | ❌ | ❌ | ❌ | **✅** | Windows.Media.SpeechSynthesis |
| Microphone | ✅ AVAudioEngine | ✅ | ✅ | ❌ | **⚠️ Future** | Windows.Media.Audio |
---
## Node Protocol Overview
For contributors: here's what implementing a Windows node means at the protocol level.
### 1. Connect as a node
The tray app's `OpenClawGatewayClient` currently connects as an **operator**. To become a node, it needs to send (or send an additional) `connect` with `role: "node"`:
```json
{
"type": "req",
"id": "connect-1",
"method": "connect",
"params": {
"minProtocol": 3,
"maxProtocol": 3,
"client": {
"id": "windows-tray",
"version": "1.0.0",
"platform": "windows",
"mode": "node"
},
"role": "node",
"scopes": [],
"caps": ["canvas", "camera", "screen", "notifications", "system"],
"commands": [
"canvas.present", "canvas.hide", "canvas.navigate",
"canvas.eval", "canvas.snapshot", "canvas.a2ui.push",
"canvas.a2ui.reset",
"camera.list", "camera.snap", "camera.clip",
"screen.record",
"system.run", "system.notify",
"system.execApprovals.get", "system.execApprovals.set"
],
"permissions": {
"camera.capture": true,
"screen.record": true
},
"auth": { "token": "..." },
"device": {
"id": "windows-machine-fingerprint",
"publicKey": "...",
"signature": "...",
"signedAt": 1706745600000,
"nonce": "..."
}
}
}
```
### 2. Handle `node.invoke` requests
The gateway sends commands via `node.invoke`:
```json
{
"type": "req",
"id": "invoke-42",
"method": "node.invoke",
"params": {
"command": "canvas.snapshot",
"args": { "format": "png", "maxWidth": 1200 }
}
}
```
The tray app responds:
```json
{
"type": "res",
"id": "invoke-42",
"ok": true,
"payload": {
"format": "png",
"base64": "iVBORw0KGgo..."
}
}
```
### 3. Dual-role connection
The tray app could connect **twice** (operator + node) or the protocol may support a **dual-role** connection. Operator gives Quick Send / status / WebChat. Node gives the agent capabilities. Both over the same WebSocket.
**Investigation needed:** Can a single WS connection carry both roles, or does it need two connections?
---
## Windows API Mapping
### Canvas → WebView2
The tray app *already has WebView2* for WebChat (#5 is the Canvas Panel issue). The same control can serve as the node canvas surface.
```csharp
// canvas.present — navigate WebView2 to a URL
await webView.CoreWebView2.Navigate(url);
// canvas.eval — execute JavaScript
string result = await webView.CoreWebView2.ExecuteScriptAsync(js);
// canvas.snapshot — capture the WebView2 content
using var stream = new InMemoryRandomAccessStream();
await webView.CoreWebView2.CapturePreviewAsync(
CoreWebView2CapturePreviewImageFormat.Png, stream);
byte[] bytes = new byte[stream.Size];
await stream.ReadAsync(bytes.AsBuffer(), (uint)stream.Size, InputStreamOptions.None);
return Convert.ToBase64String(bytes);
```
**Blocker:** #9 — WebView2 fails to initialize on ARM64 in WinUI 3 unpackaged mode. This needs resolution first.
### Camera → Windows.Media.Capture / MediaFoundation
```csharp
// camera.list
var devices = await DeviceInformation.FindAllAsync(DeviceClass.VideoCapture);
// camera.snap
var capture = new MediaCapture();
await capture.InitializeAsync(new MediaCaptureInitializationSettings {
VideoDeviceId = deviceId,
StreamingCaptureMode = StreamingCaptureMode.Video
});
var photo = await capture.CapturePhotoToStreamAsync(
ImageEncodingProperties.CreateJpeg(), stream);
```
For WinUI 3 / .NET, the [Windows.Media.Capture](https://learn.microsoft.com/en-us/uwp/api/windows.media.capture) namespace is available. Alternatively, `MediaFoundation` via COM interop gives more control.
### Screen Capture → Windows.Graphics.Capture
The [Graphics Capture API](https://learn.microsoft.com/en-us/windows/uwp/audio-video-camera/screen-capture) (Windows 10 1803+) provides screen recording:
```csharp
// screen.record
var picker = new GraphicsCapturePicker();
var item = await picker.CreateForMonitorAsync(monitorHandle);
// Or capture programmatically without picker (requires capability declaration)
var framePool = Direct3D11CaptureFramePool.Create(device, pixelFormat, 2, size);
var session = framePool.CreateCaptureSession(item);
session.StartCapture();
```
**Note:** Programmatic capture (without the user picker) requires the `graphicsCapture` restricted capability or using `CreateForMonitorAsync`. On Windows 11+, `GraphicsCaptureAccess.RequestAccessAsync` enables background capture.
### Notifications → ToastNotificationManager
```csharp
// system.notify — agent-driven notifications
var xml = ToastNotificationManager.GetTemplateContent(ToastTemplateType.ToastText02);
var textNodes = xml.GetElementsByTagName("text");
textNodes[0].InnerText = title;
textNodes[1].InnerText = body;
var toast = new ToastNotification(xml);
ToastNotificationManager.CreateToastNotifier("OpenClaw.Tray").Show(toast);
```
The tray app *already does* toast notifications from gateway events. The change is to also handle `system.notify` commands from the node protocol so the agent can *request* a notification.
### System Exec → Process.Start
```csharp
// system.run
var process = new Process {
StartInfo = new ProcessStartInfo {
FileName = "powershell.exe",
Arguments = $"-Command \"{command}\"",
RedirectStandardOutput = true,
RedirectStandardError = true,
UseShellExecute = false,
CreateNoWindow = true,
WorkingDirectory = cwd
}
};
process.Start();
string stdout = await process.StandardOutput.ReadToEndAsync();
string stderr = await process.StandardError.ReadToEndAsync();
await process.WaitForExitAsync();
```
**Critical:** Exec approvals must be enforced locally, same as macOS/headless nodes. Store in `%APPDATA%\OpenClaw\exec-approvals.json`.
### Location → Windows.Devices.Geolocation
```csharp
var geolocator = new Geolocator {
DesiredAccuracy = PositionAccuracy.High
};
var position = await geolocator.GetGeopositionAsync();
// position.Coordinate.Point.Position.Latitude / .Longitude
```
**Note:** Desktop PCs usually have poor location accuracy (IP-based). Laptops with WiFi can do better. This is a "nice to have" — lower priority than camera/canvas/screen.
### TTS → Windows.Media.SpeechSynthesis
```csharp
var synth = new SpeechSynthesizer();
var stream = await synth.SynthesizeTextToStreamAsync(text);
// Play via MediaElement or save to file
```
---
## Architectural Questions
### 1. Should the tray app be a dual-role connection (operator + node)?
**Recommendation: Yes, dual-role.**
The tray app already maintains a WebSocket connection as an operator. It should *also* register as a node on the same or a second connection. This means:
- **Option A:** Single WS, dual role — connect once with `role: ["operator", "node"]` (if protocol supports it)
- **Option B:** Two WS connections — one operator (existing), one node (new)
- **Option C:** Node-only, deprecate operator features — bad idea, lose Quick Send / status
Option A is cleanest but requires protocol support. Option B works today with no gateway changes.
### 2. Can the OpenClaw gateway run natively on Windows?
**Likely yes, with work.**
The gateway is Node.js. Node.js runs natively on Windows. But:
| Concern | Risk | Notes |
|---------|------|-------|
| Unix signals (SIGTERM, SIGHUP) | Medium | Gateway likely uses process signals. Windows has different signal model. Node.js abstracts some of this but not all. |
| File paths (forward vs back slash) | Low | Node.js `path` module handles this if used consistently. |
| Spawning child processes | Medium | `spawn('sh', ['-c', ...])` won't work on Windows. Need `cmd.exe` or `powershell.exe`. |
| `launchd`/`systemd` service install | High | `openclaw onboard --install-daemon` installs a launchd/systemd service. Windows needs a Windows Service or Task Scheduler equivalent. |
| WhatsApp/Telegram/Discord channels | Low | These are network clients, platform-agnostic. |
| Pi agent RPC | Low | Spawns Node.js processes — should work cross-platform. |
| File watching (chokidar) | Low | Works on Windows. |
| Browser automation (Playwright) | Low | Playwright supports Windows natively. |
**Recommendation:** Audit the gateway codebase for Unix assumptions. This could be a relatively tractable porting effort — most of the gateway is pure Node.js WebSocket/HTTP work.
### 3. What about the service lifecycle on Windows?
On macOS: launchd plist. On Linux: systemd unit. On Windows, options include:
- **Windows Service** (via [node-windows](https://github.com/coreybutler/node-windows) or .NET service host)
- **Task Scheduler** (run at logon)
- **Startup folder** (simplest, least robust)
- **Tray app manages gateway process** (like macOS menubar app can start/stop gateway)
The Mac menubar app has "Gateway start/stop/restart" in its menu. The tray app has this marked as ❌ in the parity table. If the gateway runs on Windows, the tray app could manage it.
### 4. WSL2 networking: the NAT problem
WSL2 runs behind a NAT. The implications:
| Direction | Works? | Notes |
|-----------|--------|-------|
| Windows → WSL2 localhost | ✅ Usually | `localhost` forwarding works for TCP. |
| WSL2 → Windows localhost | ⚠️ Varies | Use `$(hostname).local` or `host.docker.internal`. |
| External → WSL2 | ❌ By default | Needs port forwarding or `--bind 0.0.0.0`. |
| WSL2 → External | ✅ | NAT outbound works fine. |
**For the tray-app-as-node scenario:** The tray app (Windows) connects *outward* to the WSL2 gateway. This is the easy direction — Windows → WSL2 localhost works. No NAT issues.
**For native Windows gateway:** No NAT at all. Everything is loopback. Problem solved.
### 5. Dual canvas: WebChat + Node Canvas
The tray app currently uses WebView2 for WebChat. The node canvas is a *separate* surface. Options:
- **Two WebView2 instances** — one for chat, one for canvas (each in its own window/panel)
- **Tab-based UI** — WebView2 with tab switching between chat and canvas
- **Canvas as separate window** — floating overlay window with WebView2 (like macOS canvas)
**Recommendation:** Separate floating window for canvas (matches macOS behavior). The chat WebView2 stays in the tray flyout/window. Canvas appears when the agent calls `canvas.present` and hides on `canvas.hide`.
### 6. Device identity + pairing
The node protocol requires a stable device identity (`device.id`) derived from a keypair. The tray app needs to:
1. Generate an Ed25519 keypair on first run
2. Store it in `%APPDATA%\OpenClaw\device.json`
3. Derive a fingerprint as the device ID
4. Sign the challenge nonce during connect
5. Handle the pairing approval flow (first time only; device token persisted after approval)
.NET has `System.Security.Cryptography` for Ed25519 (or use a NuGet package for older .NET versions).
---
## Phased Roadmap
### Phase 1: Tray App as Native Windows Node — Notifications + Canvas
**Priority: HIGH | Effort: Medium | Impact: Huge**
- [ ] Implement node protocol in `OpenClaw.Shared` (connect with `role: "node"`, handle `node.invoke`)
- [ ] Device identity + keypair generation + pairing flow
- [ ] `system.notify` — agent can request Windows toast notifications
- [ ] `canvas.present` / `canvas.hide` — floating WebView2 canvas window
- [ ] `canvas.navigate` / `canvas.eval` / `canvas.snapshot` — full canvas support
- [ ] `canvas.a2ui.push` / `canvas.a2ui.reset` — A2UI rendering
- [ ] `system.run` — exec commands on Windows (PowerShell/cmd) with exec approvals
- [ ] Settings UI for node capabilities (enable/disable camera, screen, etc.)
- [ ] Resolve #9 (WebView2 ARM64) — required for canvas
**Depends on:** #5 (Canvas Panel), #9 (WebView2 ARM64)
### Phase 2: Screen Capture + Camera
**Priority: HIGH | Effort: Medium | Impact: High**
- [ ] `camera.list` — enumerate Windows cameras
- [ ] `camera.snap` — capture photo from webcam
- [ ] `camera.clip` — record short video clip
- [ ] `screen.record` — capture Windows desktop via Graphics Capture API
- [ ] Permission prompts (camera, screen capture consent)
- [ ] Multi-monitor support for screen capture (`--screen <index>`)
### Phase 3: Native Windows Gateway (Exploration)
**Priority: MEDIUM | Effort: High | Impact: High**
- [ ] Audit OpenClaw gateway for Unix-specific code
- [ ] Test `openclaw gateway` on Windows (Node.js native)
- [ ] Fix platform-specific issues (signals, paths, child process spawning)
- [ ] Windows Service integration for daemon mode
- [ ] Tray app: "Start/Stop/Restart Gateway" menu items (parity with Mac menubar)
- [ ] `openclaw onboard --install-daemon` for Windows (Task Scheduler or Windows Service)
- [ ] Document Windows-native gateway setup
### Phase 4: Feature Parity + Polish
**Priority: LOW | Effort: Medium | Impact: Medium**
- [ ] `location.get` — Windows Location API
- [ ] TTS / Speech Synthesis
- [ ] Microphone / voice input
- [ ] Browser proxy (Playwright on Windows, launched by tray app)
- [ ] UI Automation (Windows equivalent of macOS Accessibility API)
- [ ] Auto-update improvements (current auto-update from GitHub Releases → MSI/MSIX?)
- [ ] PowerToys Command Palette integration for node commands
---
## Technical Deep Dives
### Architecture: Node Protocol Handler
```
OpenClaw.Shared/
├── OpenClawGatewayClient.cs ← existing operator client
├── OpenClawNodeClient.cs ← NEW: node protocol handler
├── INodeCommandHandler.cs ← NEW: interface for command dispatch
├── NodeIdentity.cs ← NEW: keypair + device ID
└── Models/
├── NodeConnectParams.cs ← NEW
├── NodeInvokeRequest.cs ← NEW
└── NodeInvokeResponse.cs ← NEW
OpenClaw.Tray/
├── Services/
│ ├── NodeService.cs ← NEW: orchestrates node connection
│ ├── CanvasService.cs ← NEW: handles canvas.* commands
│ ├── CameraService.cs ← NEW: handles camera.* commands
│ ├── ScreenService.cs ← NEW: handles screen.* commands
│ ├── SystemService.cs ← NEW: handles system.* commands
│ └── ExecApprovals.cs ← NEW: local approval store
├── Windows/
│ ├── CanvasWindow.xaml ← NEW: floating WebView2 canvas
│ └── CanvasWindow.xaml.cs
```
### Architecture: Dual-Role Connection Flow
```
Tray App Start
├─ Load settings (gateway URL, token)
├─ Load/generate device identity (keypair)
├─ Connect WS #1: role=operator
│ ├─ Quick Send, status, WebChat, channel control
│ └─ (existing functionality)
└─ Connect WS #2: role=node
├─ Advertise caps: [canvas, camera, screen, system, notifications]
├─ Advertise commands: [canvas.*, camera.*, screen.*, system.*]
├─ Handle node.invoke requests
│ ├─ canvas.present → show/navigate CanvasWindow
│ ├─ canvas.snapshot → WebView2 CapturePreview
│ ├─ camera.snap → MediaCapture → JPEG → base64
│ ├─ screen.record → GraphicsCapture → MP4 → base64
│ ├─ system.run → Process.Start → stdout/stderr
│ └─ system.notify → ToastNotification
└─ Report permissions changes
```
---
## Contributing
This is a big effort and **contributions are very welcome!** Here's how to get started:
### Good First Issues
1. **Device identity module** — Generate Ed25519 keypair, store in `%APPDATA%`, derive fingerprint. Pure crypto, well-defined scope.
2. **`system.notify` handler** — Accept title + body + priority, show a Windows toast. The tray app already shows toasts — this just adds the node protocol wrapper.
3. **`system.run` handler** — Execute a command via `Process.Start`, return stdout/stderr/exit code. Add exec approvals.
### Medium Issues
4. **Node protocol client** (`OpenClawNodeClient`) — WebSocket connect with `role: "node"`, handle `node.invoke` dispatch. Builds on the existing `OpenClawGatewayClient`.
5. **Canvas floating window** — WebView2 in a borderless/floating window that appears on `canvas.present` and hides on `canvas.hide`. Related: #5.
### Harder Issues
6. **Camera capture**`Windows.Media.Capture` for photos and video clips. Handle permissions, multiple cameras, front/back mapping.
7. **Screen recording**`Windows.Graphics.Capture` for screen recording. Handle multi-monitor, permission consent, encoding to MP4.
8. **Native Windows gateway audit** — Run `openclaw gateway` on Windows, identify and fix platform-specific failures.
### Development Setup
See #7 / #8 for DEVELOPMENT.md. Quick start:
```bash
git clone https://github.com/shanselman/openclaw-windows-hub.git
cd openclaw-windows-hub
dotnet build
dotnet run --project src/OpenClaw.Tray
```
Requires .NET 10.0 SDK, Windows 10/11. For testing node protocol, you'll need a running OpenClaw gateway (in WSL2 or on another machine).
---
## Open Questions
- [ ] Does the gateway protocol support dual-role connections, or must we open two WebSockets?
- [ ] What's the minimum `PROTOCOL_VERSION` the node connect needs? (Currently 3)
- [ ] Should exec from a Windows node default to PowerShell or cmd.exe?
- [ ] How should the tray app handle "node in background" — Windows can suspend tray apps. Do we need a background service?
- [ ] Can the Graphics Capture API work without a visible window / user picker? (Background capture requires Windows 11+)
- [ ] Should we pursue MSIX packaging for the tray app to unlock restricted capabilities?
---
*This issue is a living document. As we make progress, sub-issues will be filed for individual work items and linked back here.*
/cc @shanselman