chore(sync): mirror docs from openclaw/openclaw@5457462e62

This commit is contained in:
openclaw-docs-sync[bot] 2026-05-08 13:19:05 +00:00
parent 6c3b443b28
commit b3ee5ce065
2 changed files with 55 additions and 8 deletions

View File

@ -1,15 +1,15 @@
{
"repository": "openclaw/openclaw",
"sha": "30e079dd89b451ca22cb360ca887bd9367cc7939",
"sha": "5457462e62670839b1b7d793e22f7f38a76b8b0c",
"sources": {
"openclaw": {
"repository": "openclaw/openclaw",
"sha": "30e079dd89b451ca22cb360ca887bd9367cc7939"
"sha": "5457462e62670839b1b7d793e22f7f38a76b8b0c"
},
"clawhub": {
"repository": "openclaw/clawhub",
"sha": "38c21345906ab1f107a91b33bb86b63667d96643"
}
},
"syncedAt": "2026-05-08T13:03:40.138Z"
"syncedAt": "2026-05-08T13:17:34.583Z"
}

View File

@ -1172,6 +1172,7 @@ Auto-join example:
discord: {
voice: {
enabled: true,
mode: "stt-tts",
model: "openai/gpt-5.4-mini",
autoJoin: [
{
@ -1199,8 +1200,10 @@ Auto-join example:
Notes:
- `voice.tts` overrides `messages.tts` for voice playback only.
- `voice.model` overrides the LLM used for Discord voice channel responses only. Leave it unset to inherit the routed agent model. Do not set this to `gpt-realtime-2`; Discord voice channels use STT plus TTS playback, not the OpenAI Realtime session transport.
- STT uses `tools.media.audio`; `voice.model` does not affect transcription.
- `voice.mode` controls the conversation path: `stt-tts` keeps the existing batch STT plus TTS flow, `talk-buffer` uses a realtime voice shell for turn timing/transcription/playback while the OpenClaw agent produces the answer, and `bidi` lets the realtime model converse directly while exposing `openclaw_agent_consult` for the OpenClaw brain.
- `voice.model` overrides the OpenClaw agent brain for Discord voice responses and realtime consults. Leave it unset to inherit the routed agent model. It is separate from `voice.realtime.model`.
- In `stt-tts` mode, STT uses `tools.media.audio`; `voice.model` does not affect transcription.
- In realtime modes, `voice.realtime.provider`, `voice.realtime.model`, and `voice.realtime.voice` configure the realtime audio session. For OpenAI Realtime 2 plus the Codex brain, use `voice.realtime.model: "gpt-realtime-2"` and `voice.model: "openai-codex/gpt-5.5"`.
- For an OpenAI voice on Discord playback, set `voice.tts.provider: "openai"` and choose a Text-to-speech voice under `voice.tts.openai.voice` or `voice.tts.providers.openai.voice`. `cedar` is a good masculine-sounding choice on the current OpenAI TTS model.
- Per-channel Discord `systemPrompt` overrides apply to voice transcript turns for that voice channel.
- Voice transcript turns derive owner status from Discord `allowFrom` (or `dm.allowFrom`); non-owner speakers cannot access owner-only tools (for example `gateway` and `cron`).
@ -1211,7 +1214,7 @@ Notes:
- `@discordjs/voice` defaults are `daveEncryption=true` and `decryptionFailureTolerance=24` if unset.
- `voice.connectTimeoutMs` controls the initial `@discordjs/voice` Ready wait for `/vc join` and auto-join attempts. Default: `30000`.
- `voice.reconnectGraceMs` controls how long OpenClaw waits for a disconnected voice session to begin reconnecting before destroying it. Default: `15000`.
- Voice playback does not stop just because another user starts speaking. To avoid feedback loops, OpenClaw ignores new voice capture while TTS is playing; speak after playback finishes for the next turn.
- In `stt-tts` mode, voice playback does not stop just because another user starts speaking. To avoid feedback loops, OpenClaw ignores new voice capture while TTS is playing; speak after playback finishes for the next turn. Realtime modes forward speaker starts as barge-in signals to the realtime provider.
- `voice.captureSilenceGraceMs` controls how long OpenClaw waits after Discord reports a speaker has stopped before finalizing that audio segment for STT. Default: `2500`; raise this if Discord splits normal pauses into choppy partial transcripts.
- When ElevenLabs is the selected TTS provider, Discord voice playback uses streaming TTS and starts from the provider response stream. Providers without streaming support fall back to the synthesized temp-file path.
- OpenClaw also watches receive decrypt failures and auto-recovers by leaving/rejoining the voice channel after repeated failures in a short window.
@ -1219,7 +1222,7 @@ Notes:
- `The operation was aborted` receive events are expected when OpenClaw finalizes a captured speaker segment; they are verbose diagnostics, not warnings.
- Verbose Discord voice logs include a bounded one-line STT transcript preview for each accepted speaker segment, so debugging shows both the user side and the agent reply side without dumping unbounded transcript text.
Voice channel pipeline:
STT plus TTS pipeline:
- Discord PCM capture is converted to a WAV temp file.
- `tools.media.audio` handles STT, for example `openai/gpt-4o-mini-transcribe`.
@ -1227,7 +1230,51 @@ Voice channel pipeline:
- `voice.model`, when set, overrides only the response LLM for this voice-channel turn.
- `voice.tts` is merged over `messages.tts`; streaming-capable providers feed the player directly, otherwise the resulting audio file is played in the joined channel.
Credentials are resolved per component: LLM route auth for `voice.model`, STT auth for `tools.media.audio`, and TTS auth for `messages.tts`/`voice.tts`.
Realtime talk-buffer example:
```json5
{
channels: {
discord: {
voice: {
enabled: true,
mode: "talk-buffer",
model: "openai-codex/gpt-5.5",
realtime: {
provider: "openai",
model: "gpt-realtime-2",
voice: "cedar",
},
},
},
},
}
```
Realtime bidi example:
```json5
{
channels: {
discord: {
voice: {
enabled: true,
mode: "bidi",
model: "openai-codex/gpt-5.5",
realtime: {
provider: "openai",
model: "gpt-realtime-2",
voice: "cedar",
toolPolicy: "safe-read-only",
consultPolicy: "always",
},
},
},
},
}
```
Credentials are resolved per component: LLM route auth for `voice.model`, STT auth for `tools.media.audio`, TTS auth for `messages.tts`/`voice.tts`, and realtime provider auth for `voice.realtime.providers` or the provider's normal auth config.
### Voice messages