Companion application refactoring (#272 )

* feat: unified Hub window with NavigationView, slim tray menu, and inline toggles

Consolidate 8 separate windows into a single Hub app:
- New HubWindow with NavigationView (Chat, Home, Activity, Settings pages)
- Chat page embeds gateway Control UI via WebView2 (default landing page)
- Home page with live status cards, quick actions, and activity feed
- Settings page with Expander sections, Test Connection, SSH tunnel fields
- Activity page with category filters and live ActivityStreamService binding
- Slim tray menu: status + 3 inline toggles + Hub/QuickSend + Settings/Exit
- Acrylic backdrop on tray flyout, auto-collapsing nav, page transitions
- Deep links (openclaw://) redirected to Hub pages
- Deleted 5 old windows: StatusDetail, ActivityStream, NotificationHistory, WebChat, Settings
- WebView2 event handler cleanup on page Unloaded (code review fix)
- Deferred page init to avoid null Settings during Frame.Navigate

25 files changed, 1350 insertions(+), 2033 deletions(-) — net code reduction

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: settings validation, CSS sidebar hiding, toggle labels, and SSH tunnel support

- Settings save validates gateway URL and SSH tunnel fields before saving
- Test Connection supports SSH tunnel mode (starts temp tunnel for test)
- SSH toggle auto-updates gateway URL field (shows loopback in tunnel mode)
- Chat page injects CSS to hide web Control UI sidebar (no dual navigation)
- Tray toggle switches hide On/Off labels to prevent clipping

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* feat: full native UI with 12 pages, config editor, dual connection, and live gateway data

New pages: Sessions, Channels, Usage, Nodes, Cron, Skills, Config, About
- Sessions: live session list with reset/delete/compact actions
- Channels: channel health cards with start/stop controls
- Usage: cost breakdown, provider stats, daily costs
- Nodes: node inventory with capabilities and device ID copy
- Cron: scheduled jobs with run/remove actions (gateway wired)
- Skills: installed skills status (gateway wired)
- Config: TreeView + detail panel with editable values via config.set protocol
- About: version info, debug tools, documentation links

Infrastructure:
- Dual WebSocket connection: operator client (UI data) + node client (commands)
- 14 new gateway methods: cron.*, skills.*, config.* with JsonElement.Clone()
- Data caching in HubWindow for instant page navigation
- Session startedAt parsing fix (handles number + string timestamps)
- Hub window synced after settings reconnect
- Tray test updated for Auto scrollbar

Build: 0 errors | Tests: 774 passing (652 shared + 122 tray)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* UX Round 2: Agent events, discovery, pairing, models, presence, menu redesign, timer removal

Features:
- Agent Events page with stream filters and 400-event ring buffer
- Gateway discovery via mDNS (Zeroconf) with scan UI in General page
- Node/Device pairing UI with approve/reject in Nodes page
- Models list in Sessions page via models.list gateway method
- Instances page rewired to presence data from handshake snapshot
- Context cards in tray menu (session summary + token usage bars)
- Pairing pending count in tray menu

Menu & UX:
- Merged status + toggle into rich header card with status dot
- Permission toggles section (browser, camera, exec, canvas, screen)
- Renamed hub to Windows Companion with smart disconnect navigation
- Custom title bar with live connection status + gateway version
- Single-click tray -> chat, double-click -> hub
- Chat window pre-warmed on startup, hides instead of closing

Architecture:
- Removed all background timers (10s poll + 30s health check)
- On-demand data loading only (pages fetch on navigation)
- Fixed ParseSessions to merge instead of clear-then-rebuild (no flicker)
- Fixed HandleAgentEvent sessionKey parsing (was reading from root, not payload)
- Symmetric subscribe/unsubscribe for all new gateway events
- Caches cleared on disconnect, seeded into HubWindow on open

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Restructure navigation: agents as hierarchical nav items under Gateway

- Replace flat Agent section with hierarchical Agents > {agentId} > {sub-pages} structure
- Move Conversations shortcut to top level, pointing to SessionsPage
- Remove AgentSelectorNav ComboBox, replaced with dynamic nav tree (RebuildAgentNavItems)
- Add FindAndSelectNavItem for recursive nested nav item selection
- Add agent: tag parsing (ResolveAgentPageType, ParseAgentIdFromTag)
- Update NavigateTo with legacy flat tag mapping to new agent: prefix format
- Strip Zone B (Agent Roster) and Zone C (Capability Toggles) from HomePage
- Remove Node Mode toggle from SettingsPage (lives in CapabilitiesPage)
- Update command palette to use agent-scoped tags
- NavigateToDefault now goes to Home instead of Conversations

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* UX Round 3: Hearth redesign, hierarchical nav, command palette, agent APIs

Navigation restructure:
- 16 flat pages → domain-grouped hierarchical nav (Gateway, This Computer)
- Each agent gets expandable nav item with Sessions, Events, Skills, Workspace
- Dynamic agent nav built from agents.list gateway response
- Nodes nested under Instances (superset relationship)
- Cron moved to gateway section (gateway-wide, not per-agent)
- Connection page extracted from Settings into Gateway section
- Settings simplified to local-only (startup, notifications)

New pages:
- ConnectionPage — gateway URL, token, SSH tunnel, discovery, status
- CapabilitiesPage — device toggles + node status indicator
- WorkspacePage — TabView per-file viewer via agents.files.list/get
- BindingsPage — routing rules viewer (channel→agent)
- ConversationsPage — cross-agent session browser (legacy)

Command palette (Ctrl+K / Ctrl+F):
- Inline overlay with light dismiss (click outside, Escape, Enter)
- 20+ navigation + 5 toggle + dynamic session commands
- Fuzzy substring filtering

Home page (The Hearth):
- Molty status indicator with colored ring
- Natural language status text
- Quick action buttons

Agent-scoped data:
- Sessions, Skills pass agentId to gateway calls
- Agent Events filtered by sessionKey prefix
- Workspace scoped to current agent in hierarchy

Real gateway APIs:
- agents.list → dynamic nav + agent roster
- agents.files.list/get → workspace file viewer
- Cached agents list for hub seeding on open

Architecture:
- Canvas window styled with Mica + custom title bar
- Open Canvas menu item uses NodeService.ShowCanvasWindow()

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* feat(tray): add rich session tooltips and connected devices section

- Add rich ToolTip to each session label showing model, provider,
  channel, thinking level, token breakdown, context window, status,
  and age
- Add Connected Devices section between context summary and Permissions
  with online indicator dots, platform badges, and rich tooltips
- Show connected client count from presence data in status header subtitle

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Redesign session and device tray menu with rich compact cards

Replace plain text session/device rows with structured Grid cards featuring:
- Status dot (green/amber/gray) + name + model/platform badge + chevron
- Token usage with percentage and color-coded progress bar
- Channel badges and capability icons for devices
- Section headers with right-aligned summary stats

Add AddFlyoutCustomItem() to TrayMenuWindow for custom UIElement
flyout items with hover-to-show and click-to-navigate behavior.

Build detailed side flyout panels with headers, token breakdowns,
capability listings, and session metadata.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* UX Round 4: Chat popup, rich tray menu, schema config editor

Chat panel:
- Tray-anchored borderless popup (bottom-right, DPI-aware)
- WS_EX_TOOLWINDOW + no caption/frame (hidden from taskbar)
- Auto-hide on deactivate, instant show (no animation — WebView2 incompatible)
- Single-click toggle, double-click opens Hub (400ms detection)
- Chat window recreated on settings change (stale URL fix)

Tray menu:
- Rich 2-line session cards: status dot, model badge, token progress bar
- Rich device cards: capability emoji strip, platform badge
- Flyout detail panels: non-interactive TextBlocks (not menu items)
- Session flyout: model/provider, channel, ASCII token bar, thinking/verbose
- Device flyout: capabilities merged with commands (cap as header, cmds indented)
- Dynamic capability toggles under local device (from node.Capabilities)
- Flyout dismisses on any menu item hover (including separators/headers/toggles)
- Section headers with aggregates (sessions/tokens, online/caps)
- AddFlyoutCustomItem + AddToggleItem indent support

Schema config editor:
- SchemaConfigEditor UserControl renders from JSON Schema
- Supports string/number/boolean/enum/array/nested objects
- Sensitive field detection (PasswordBox)
- Fallback RenderConfigDirectly when schema unavailable
- Config detail panel uses schema for selected tree node
- Pending changes preserved on save failure

Command palette:
- Rebuilt as inline overlay Grid (light dismiss: Escape + click-outside)
- Ctrl+F added as alternate shortcut
- TextBox replaces AutoSuggestBox (proper Escape handling)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* UX Round 5: Connection dashboard, tray polish, review fixes

Connection Management:
- Redesigned ConnectionPage with 6-section card layout: status card with
  live gateway info, gateway discovery picker with mDNS scan, setup code
  paste (from openclaw qr), manual connection expander, device identity
  card with pairing status and copyable approval command, connection log
- Auto-discovery when disconnected or unconfigured
- Setup code applies both bootstrapToken and Token for immediate connect
- PreferredGatewayId persisted in settings

Tray Menu Polish:
- Compact 3-column ToggleButton grid for capability toggles (all 7 shown)
- Split header subtitle into 2 lines (connection details + node status)
- Auth failure warning shown inline
- Reconnect and Connection quick actions
- Dropped 'Open' prefix from action items, added bottom padding

Hub Window:
- Removed XAML KeyboardAccelerators (caused tooltip flicker on hover)
- Replaced with PreviewKeyDown handler for Ctrl+K/F
- Added ReconnectAction, LastGatewaySelf, node state properties

Connection Lifecycle Fixes (from 3-model rubber-duck review):
- Capability toggles now use ReconnectNodeServiceOnly() instead of full
  teardown — no longer kills gateway client or chat window
- Reconnect action uses lightweight ReconnectGateway() (preserves chat)
- SyncHubNodeState() pushes live pairing/identity to hub on every
  node status and pairing change
- Gateway matching uses host:port comparison (not full URL with scheme)
- Discovery service disposed on page Unloaded
- Connection log refreshes on every status change
- SanitizeUrl guards against port -1
- Null-conditional restored on _hub?.RaiseSettingsSaved()
- Synthesized current gateway entry doesn't mutate cached list

Other:
- Instant single-click chat toggle (removed double-click debounce)
- Catch-all ShowHub(action) for menu nav tags
- SSH tunnel section flattened (removed redundant nested expander)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* UX Round 6: Token bar ProgressBar, title bar search, MCP toggle

Session flyout:
- Replaced ASCII token bar (█░) with WinUI ProgressBar (green/orange/red)
- Built flyout as native UIElement (StackPanel) instead of text items
- Added AddFlyoutCustomItem(UIElement, UIElement, action) overload

Title bar search:
- Replaced hidden command palette overlay with AutoSuggestBox in title bar
- Standard Windows pattern, always visible, Ctrl+K/F focuses it
- Lobster icon 14px → 20px, title shortened to 'OpenClaw'
- Removed overlay XAML, smoke layer, and palette methods

MCP server toggle:
- Added Local MCP Server card on Capabilities page
- Toggle, endpoint URL display, Copy Token/URL buttons
- Shows token readiness status

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix session flyout to match device flyout style

Replaced custom UIElement session flyout panel with simple
TrayMenuFlyoutItem list matching the device flyout pattern.
ProgressBar stays in the main menu session card only.
Removed unused AddFlyoutCustomItem(UIElement, UIElement) overload
and ShowCascadingFlyoutElement helper.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* UX Round 7: Nav restructure, App MCP, config editor, review fixes

Navigation:
- Sessions, Agent Events, Skills promoted to top-level with agent filter
- Agents folded to one level (direct nav → Workspace)
- Instances merged into Nodes with Connected Clients section
- Title restored to 'OpenClaw Windows Companion'
- Title bar: 48px height, responsive search (Ctrl+E), lobster 20px

Config page:
- Schema-driven tree (objects only, no leaf nodes)
- Editor + Raw JSON tabs
- config.patch sends { raw, baseHash } (hash from config.get response)
- Subtitle shows actual config file path from gateway
- All expanders open by default

App MCP capability (10 tools):
- app.navigate/status/sessions/agents/nodes/config.get
- app.settings.get/set with security allowlist (no secrets)
- app.menu/search for tray and command palette testing
- All handlers return structured data (not stringified JSON)
- Sessions filter by session key prefix (not channel)

Bug fixes:
- AgentEventsPage: init NRE guard, filter applies to display
- CapabilitiesPage: MCP toggle suppress during init
- SessionsPage: removed unused agent filter
- Config save: proper baseHash from gateway hash field

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* 5-model adversarial review fixes + regression tests

Fixes from Opus 4.7, Sonnet 4.5, GPT-5.4, GPT-5.2, Opus 4.6 reviews:
- Guard IndexOutOfRangeException on empty session Status
- Fix TCS hang when DispatcherQueue unavailable in app.navigate
- Static s_emptyObject replacing leaked JsonDocument in config tree
- Always prune stale sessions (removed incomingKeys.Count > 0 guard)
- try/catch/finally on 6 async void handlers (Channels, Sessions, Config)
- Seed ALL cached data before NavigateToDefault in ShowHub
- Move CurrentStatus/_cachedCommands inside DispatcherQueue in UpdateStatus
- Raw JSON tab uses 'parsed' not wrapper object
- Null-safe Subtitle in search handler
- Invalidate command cache on agent switch
- Dispose SemaphoreSlim in GatewayDiscoveryService

Regression tests (9 new):
- AppCapabilityTests: category, commands, CanHandle, handlers, errors
- SessionInfo empty Status guard
- ParseSessions empty array clears sessions

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Tray menu: header redesign, dismiss fix, session reliability

Header:
- Split into brand header (🦞 OpenClaw) + Gateway section
- Gateway section: status dot, version/host:port, node status, labeled
  ToggleButton ('Connected'/'Disconnected') with tooltip
- Gateway info clickable → opens Connection page
- Menu dismisses after connect/disconnect toggle (avoids stale header)

Dismiss:
- Unified 150ms delayed foreground check for all deactivation cases
- Checks this window, flyout child, and owner parent before dismissing
- Fixes: click-away dismisses everything, hover between items doesn't
- Set _isShown=true in ShowAtCursor (was missing, broke dismiss guard)

Sessions:
- Removed connection status gate — show cached sessions always
- Zero gateway requests on menu open (health check was clearing sessions
  via ParseSessions in the response)
- Session cards click → 'sessions' top-level route

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Connection UX: localhost probe, auth errors, token prompt, gateway switching

Discovery:
- Localhost probe enumerates listening TCP ports via GetActiveTcpListeners
- Probes for gateway HTML signature (<title>OpenClaw Control</title>)
- Excludes MCP port (8765) to avoid false positives
- Runs in parallel with mDNS, results merged

Connection page:
- Auth error InfoBar with contextual guidance (token/pairing/password/signature)
- HubWindow.LastAuthError forwarded from OnAuthenticationFailed
- Cleared on successful connect and new connection attempts
- Token prompt always shows when switching gateways (pre-fills current token)
- Cancel button on token prompt
- Discovery list refreshes after connecting to show ✓

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Prepare UX experiments branch for PR

Fix gateway discovery host resolution, harden branch-introduced security paths, complete localization coverage, remove newly introduced dead code, refresh documentation, wire functional UX flows, and stabilize the Config page rendering path.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix MCP-only tray startup

Initialize the local node service when MCP mode is enabled even if gateway node mode is disabled, so MCP-only tray launches start the HTTP server used by integration tests.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Ranjesh Jaganathan <ranjeshj@microsoft.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

2026-05-04 20:48:24 -07:00

20 KiB

Raw Permalink Blame History

Local MCP Mode

Status: Implemented (initial cut). See src/OpenClaw.Shared/Mcp/, src/OpenClaw.Shared/Mcp/McpHttpServer.cs, and the Settings UI MCP section.

Summary

The Windows tray app now ships a local Model Context Protocol (MCP) server alongside its existing OpenClaw gateway client. The same node capabilities the agent reaches over the OpenClaw gateway WebSocket — system.run, screen.snapshot, canvas.*, camera.list, camera.snap, camera.clip, location.get, tts.speak, system.notify, system.execApprovals.* — are advertised, on the same machine, as MCP tools over http://127.0.0.1:8765/.

This means any local MCP client (Claude Desktop, Claude Code, Cursor, an MCP-aware CLI, a custom dev script) can reach into the running tray and drive Windows-native capabilities directly, without an OpenClaw gateway in the loop. The tray app can run in MCP-only mode with no gateway connection at all.

The implementation is structured so that adding a new node capability automatically exposes it via MCP — no MCP-side code changes required. That is the central design constraint and the main reason we built MCP in-process rather than as a separate adapter.

Goals

Single source of truth for capabilities. A new INodeCapability registered with WindowsNodeClient.RegisterCapability(...) is reachable via every transport the tray supports. Today: gateway WebSocket and local MCP HTTP. Future transports (named pipe, gRPC, whatever) plug in the same way.
Local-first development. Capabilities can be exercised on Windows without standing up an OpenClaw gateway, without an account, without auth, without a tunnel.
Make MCP clients first-class consumers of the OpenClaw native node, not afterthoughts. The tooling investment in capabilities (camera consent flows, exec approval policy, canvas WebView2 plumbing) pays off in both directions: agent-via-gateway and agent-via-local-MCP.

Non-goals (for this iteration)

No remote authentication. Loopback bind + Origin/Host checks keep the endpoint unreachable from any other machine. A local bearer token guards against untrusted local processes on the same box (see Authentication below). We will revisit ACLs / multi-user when we want remote MCP, multiple users on one box, or shared dev VMs.
No SSE / streaming. Plain JSON-RPC request/response is enough for the synchronous capabilities we have today.
No per-tool input schemas. Capabilities don't expose schemas; MCP inputSchema is permissive ({type: "object", additionalProperties: true}). When/if INodeCapability grows a schema property, the MCP bridge picks it up with no other changes.
No port configuration UI. Default 8765 is hardcoded. Easy to lift into SettingsManager later.

Architecture

Single capability registry, two transports

                ┌─────────────────────────────────────────────┐
                │                NodeService                  │
                │                                             │
                │   List<INodeCapability> _capabilities ◄───┐ │
                │                                           │ │
                │   private void Register(INodeCapability)  │ │
                │   {                                       │ │
                │       _capabilities.Add(cap);             │ │
                │       _nodeClient?.RegisterCapability(cap)│ │
                │   }                                       │ │
                └────┬───────────────────────┬──────────────┘─┘
                     │                       │
                     │                       │
                     ▼                       ▼
          ┌─────────────────────┐  ┌─────────────────────┐
          │ WindowsNodeClient   │  │ McpToolBridge       │
          │ (gateway WebSocket) │  │ (JSON-RPC dispatch) │
          └─────────┬───────────┘  └─────────┬───────────┘
                    │                        │
                    ▼                        ▼
            OpenClaw gateway          McpHttpServer
                                  (HttpListener@127.0.0.1:8765)
                                            │
                                            ▼
                                Local MCP clients
                            (Claude Code, Cursor, etc.)

The capability list lives on NodeService, not on WindowsNodeClient. That single change is what makes MCP-only mode possible: the gateway client is now optional. When it exists, Register(cap) pushes capabilities into both the local list and the gateway client's registration message. When it doesn't (MCP-only), capabilities still populate the local list and the MCP bridge serves them.

MCP bridge

OpenClaw.Shared/Mcp/McpToolBridge.cs is transport-agnostic JSON-RPC 2.0. It implements:

initialize — protocol version 2024-11-05, server info.
tools/list — flattens _capabilities into MCP tools. Tool name = command name ("screen.snapshot"); description = "{category} capability: {command}"; inputSchema is permissive.
tools/call — finds the capability via INodeCapability.CanHandle(name), builds a NodeInvokeRequest (the same struct the gateway path uses), calls ExecuteAsync, wraps the result as MCP content[].text. Tool failures come back as result.isError = true, not JSON-RPC errors (per MCP spec — JSON-RPC errors are reserved for protocol issues).
ping, notifications/initialized — protocol housekeeping.

The bridge takes a Func<IReadOnlyList<INodeCapability>> rather than a snapshot. Every tools/list re-reads the live list. This is what guarantees zero-cost capability addition — register a new capability after server start and it appears on the next tools/list.

HTTP transport

OpenClaw.Shared/Mcp/McpHttpServer.cs is System.Net.HttpListener bound to http://127.0.0.1:8765/. Loopback-only by construction; not reachable from any other machine even with firewall holes. A defensive IPAddress.IsLoopback check on each request acts as belt-and-suspenders.

GET / returns a friendly text probe. POST / is JSON-RPC. Anything else → 405. When a bearer token is configured, every verb must pass the token gate before method dispatch.

Authentication

The HTTP transport requires a bearer token on every request. Defense-in-depth on top of loopback bind + Origin/Host checks: if an attacker can run code in any local user context they can reach 127.0.0.1:8765, so we don't want the listener to be open-by-construction.

Where the token lives. %APPDATA%\OpenClawTray\mcp-token.txt. The exact path is composed by NodeService.McpTokenPath from SettingsManager.SettingsDirectoryPath, so the test-suite override OPENCLAW_TRAY_DATA_DIR isolates the token file too. The file inherits the parent directory's ACL — by default only the current user (and SYSTEM/Administrators) can read it.

When it's created. Lazily, on the first NodeService.StartMcpServer() call — i.e. the first time the user enables Local MCP Server in Settings and saves. Until that toggle has been on at least once, the file does not exist. This trips up users who try to grab the token before flipping the switch.

How long it is. 32 bytes of CSPRNG output, base64url-encoded with padding stripped → 43 ASCII characters (~256 bits of entropy). See McpAuthToken.Generate().

Lifetime. The token is persistent across tray restarts. It's only regenerated if the file is deleted or its contents are emptied. There is no automatic rotation.

On the wire. Every request must carry Authorization: Bearer <token> when the server has a configured token. Missing or wrong token → 401 Unauthorized with no body. GET / remains a "yes I'm here" probe after auth passes.

How users find it. Settings → Developer Mode → MCP section shows the live token (masked, with Reveal/Copy buttons) and the storage path. For agents that read from disk (Claude Code, custom scripts), pointing them at McpTokenPath is preferable to embedding the token in their prompt or config — the path is stable, the token is a secret. For agents that only accept literal bearer values in config (Claude Desktop, Cursor), use Copy.

Settings model

Two independent toggles in SettingsData:

public bool EnableNodeMode { get; set; }      // open WebSocket to gateway
public bool EnableMcpServer { get; set; }     // run local MCP HTTP server

`EnableNodeMode`	`EnableMcpServer`	Result
off	off	Operator-only (legacy default)
off	on	MCP server only, no gateway
on	off	Gateway node, no MCP
on	on	Gateway node + MCP

Settings UI exposes both toggles in the Advanced section, with the live MCP endpoint URL and current status (Listening / Stopped — save and restart to start / Disabled).

A legacy McpOnlyMode field is migrated automatically on load and never re-written.

Why this matters

Testing

The tray's most interesting code lives in capabilities — system.run (LocalCommandRunner + ExecApprovalPolicy), screen.snapshot (Windows.Graphics.Capture + GraphicsCapturePicker), canvas.* (WebView2 with trusted origin enforcement), camera.snap/camera.clip (MediaCapture + consent prompt), location.get (Windows.Devices.Geolocation). All of that has nontrivial Windows-only behavior and almost none of it is currently exercised end-to-end without first standing up a gateway and authenticating.

Local MCP changes that. Concrete benefits:

Manual smoke tests in seconds. curl -s -X POST http://127.0.0.1:8765/ -H "Content-Type: application/json" -d '{"jsonrpc":"2.0","id":1,"method":"tools/list"}' validates that the capability dispatch path works, the WinUI dispatcher marshaling is correct, the result shape matches expectations. No gateway, no token, no SSH tunnel.
Reproducible bug reports. A repro becomes a tools/call body the bug filer can paste verbatim. No "what was the gateway doing at the time."
Integration tests against a real instance. A future tests/integration/ project can spin up the tray in MCP-only mode, fire JSON-RPC, assert results. The same test bodies a developer runs by hand are the same ones CI runs. (Harnessing WinUI itself in CI is harder, but the bridge logic — McpToolBridge — is already covered by McpToolBridgeTests with no UI involvement.)
Coverage for the dispatch path itself. WindowsNodeClient's capability-routing logic (CanHandle → ExecuteAsync) was previously only exercised against a live gateway. The MCP server hits the same code paths, so any local MCP test is implicit coverage of the gateway dispatch.
Bridge unit tests already exist. tests/OpenClaw.Shared.Tests/McpToolBridgeTests.cs (9 cases) covers initialize, tools/list, runtime capability registration, tool calls, unknown tools, capability failures, JSON-RPC unknown method, notifications, and parse errors. These are pure C# unit tests with fake capabilities — no HTTP, no UI, no gateway.

Access from CLIs and agents

The exact same node tools the OpenClaw gateway uses are now invocable by any local MCP-aware client:

Claude Code (this CLI). Add to ~/.claude.json or per-project .mcp.json:
```
{
  "mcpServers": {
    "openclaw-tray": {
      "type": "http",
      "url": "http://127.0.0.1:8765/"
    }
  }
}
```
The agent then sees screen.snapshot, system.run, canvas.*, etc. as tools, with whatever arguments the capability accepts.
Claude Desktop. Same config shape under MCP servers.
Cursor. Same.
GitHub Copilot CLI / Copilot in the terminal. As MCP support lands in those clients, the endpoint is already there.
Custom dev scripts. Anything that can speak HTTP + JSON-RPC. A 30-line Python or Node helper can drive the entire capability surface.

In all cases the user gets a Windows-native agent experience without OpenClaw infrastructure. They can be entirely offline w.r.t. an OpenClaw gateway and still hand the LLM a working set of "do something on my Windows box" tools.

Dev acceleration when building new features

This is the strongest argument for making MCP a first-class citizen, not an afterthought.

When a contributor adds a new capability — say, clipboard.read, clipboard.write, windows.list, audio.transcribe, git.status, office.draft_email — today the workflow looks like:

Implement INodeCapability.
Wire it into NodeService.RegisterCapabilities().
Stand up a gateway, authenticate, pair the device, etc., to test.
Drive the capability from within an agent conversation, observing logs and taking screenshots to confirm correctness.

With MCP in-process the workflow shortens to:

Implement INodeCapability.
Wire it into NodeService.RegisterCapabilities().
Restart the tray. The new tool is immediately visible to any local MCP client (tools/list re-reads the registry every call), and to manual curl tests.

The dev loop for capabilities is now identical to the dev loop for any local HTTP server: edit, restart, hit the endpoint, observe. No gateway, no agent, no auth.

This compounds when you stack it with Claude Code or Cursor on the same machine. A contributor can:

Open the repo in their IDE.
Run the tray with EnableMcpServer = true.
Have Claude Code connected to the same MCP endpoint.
Iterate on a new capability while the agent — using that very capability — helps drive the iteration. The capability under development can be invoked by the assistant on the next turn after a tray restart. That's a tight self-hosted feedback loop.

It also reduces the cost of "speculative" capabilities. Today, adding a capability has a tax: it must be useful enough to justify the extra surface in the gateway/agent stack. With local MCP, a contributor can build a capability speculatively, validate it against their own MCP-aware agent, and only later decide whether to formalize it for gateway use. That lowers the bar for experimentation.

Security model

The server is built on three defensive layers, not just one. Loopback alone is not sufficient — a browser tab the user opens is also on the loopback interface, so a malicious page could otherwise reach http://127.0.0.1:8765/ directly.

Loopback bind. HttpListener is registered with the prefix http://127.0.0.1:8765/. The Windows kernel binds the listening socket to the loopback interface only — packets from other interfaces are not delivered to it. Firewall configuration is irrelevant. Defends against: another machine on the network.
Defensive IsLoopback check. Each incoming request validates ctx.Request.RemoteEndPoint.Address. Belt-and-suspenders for #1.
CSRF / browser gate. Each request is rejected if any of the following holds:
- the request carries an Origin header (real MCP clients — Claude Desktop, Cursor, Claude Code, curl — never send Origin; browsers always do for cross-origin fetches);
- the Host header is anything other than 127.0.0.1[:port] or localhost[:port] (defends against DNS-rebinding pivots);
- on POST, the Content-Type is anything other than application/json (forces a CORS preflight from a browser, which we never satisfy).
- the request body exceeds 4 MiB (DoS / OOM cap).
Together these three checks force a malicious cross-origin browser fetch into a CORS preflight that we deliberately do not honor (no Access-Control-Allow-* is ever emitted), so the actual call is blocked before reaching capability code.
Concurrency cap. A semaphore limits in-flight handlers to 8. A misbehaving local client cannot pin every threadpool thread on long-running screen/camera calls.
Capability-level controls remain in force. SystemCapability.SetApprovalPolicy(...) (the exec approval policy) still gates system.run. Camera and screen capture still go through Windows consent flows. MCP doesn't bypass any of those.

Still no authentication. Any user-context local process with a TCP socket and the port number can drive any capability. This is the same trust boundary as anything that runs as the user — a malicious process on the box could already invoke arbitrary Win32 APIs without going through MCP. We don't try to stop user-context processes from talking to MCP. If that turns out to matter (multi-user shared boxes, low-trust local processes), the right answer is per-call bearer tokens issued by the tray (one-time copy-to-clipboard from the Settings UI), not URL ACLs or HTTPS — both add deployment pain without solving the actual problem.

Verifying the gate

These should all be rejected with 403 Forbidden:

# Browser pretending to come from another origin
curl -X POST http://127.0.0.1:8765/ -H "Origin: https://evil.com" -H "Content-Type: application/json" -d '{}'

# DNS rebinding attempt
curl -X POST http://127.0.0.1:8765/ -H "Host: evil.com" -H "Content-Type: application/json" -d '{}'

This should be rejected with 415:

curl -X POST http://127.0.0.1:8765/ -H "Content-Type: text/plain" --data '{"jsonrpc":"2.0","id":1,"method":"ping"}'

These should succeed:

curl http://127.0.0.1:8765/ -H "Authorization: Bearer <token>"   # GET probe
curl -X POST http://127.0.0.1:8765/ -H "Authorization: Bearer <token>" -H "Content-Type: application/json" -d '{"jsonrpc":"2.0","id":1,"method":"ping"}'

What's deliberately deferred

These are reasonable next steps but explicitly out of scope for the initial implementation:

Per-tool input schemas. Add an IReadOnlyDictionary<string, JsonElement> InputSchemas (or per-command descriptor) to INodeCapability. The MCP bridge's HandleToolsList picks them up automatically. Until then, MCP clients see permissive schemas and the agent has to figure out arg shapes from descriptions and trial-and-error.
~~Authentication.~~ Implemented. See Authentication below.
Streamable HTTP / SSE. For long-running tools (screen.record, future audio.transcribe), MCP supports streaming progress. The bridge needs to learn about it and the HTTP server needs to optionally upgrade.
Resource and prompt support. MCP has resources/* and prompts/* methods we currently no-op. Notifications, recent activity, channel state could be modeled as MCP resources.
Configurable port. Move McpDefaultPort into SettingsManager. Probably also pick a free port at startup if the default is in use, and surface the actual port in the Settings UI.
Setup Wizard step. Today the Settings Advanced section is the only way to enable MCP. The Setup Wizard could offer it as a one-click option, especially attractive for users who don't run a gateway at all.

File map

File	Role
`src/OpenClaw.Shared/Mcp/McpToolBridge.cs`	Transport-agnostic JSON-RPC dispatcher.
`src/OpenClaw.Shared/SettingsData.cs`	Settings JSON model. Adds `EnableMcpServer`; deprecates `McpOnlyMode`.
`src/OpenClaw.Shared/Mcp/McpHttpServer.cs`	`HttpListener`-based loopback HTTP transport.
`src/OpenClaw.Tray.WinUI/Services/NodeService.cs`	Owns the capability list. Hosts the MCP server when enabled.
`src/OpenClaw.Tray.WinUI/Services/SettingsManager.cs`	In-memory settings model + load/save. Migrates legacy `McpOnlyMode`.
`src/OpenClaw.Tray.WinUI/Pages/SettingsPage.xaml(.cs)`	Settings UI surface hosted by `HubWindow`.
`src/OpenClaw.Tray.WinUI/App.xaml.cs`	Bootstraps `NodeService` based on the new mode matrix.
`tests/OpenClaw.Shared.Tests/McpToolBridgeTests.cs`	9 unit tests for the bridge.

Quick verification

With the tray running and EnableMcpServer = true:

# Server is up
curl http://127.0.0.1:8765/

# List tools
curl -s -X POST http://127.0.0.1:8765/ `
  -H "Content-Type: application/json" `
  -d '{"jsonrpc":"2.0","id":1,"method":"tools/list"}'

# Take a screenshot of the primary monitor
curl -s -X POST http://127.0.0.1:8765/ `
  -H "Content-Type: application/json" `
  -d '{"jsonrpc":"2.0","id":2,"method":"tools/call","params":{"name":"screen.snapshot"}}'

For Claude Code, drop this into .mcp.json at the repo root or ~/.claude.json:

{
  "mcpServers": {
    "openclaw-tray": {
      "type": "http",
      "url": "http://127.0.0.1:8765/"
    }
  }
}

20 KiB Raw Permalink Blame History