chore(sync): mirror docs from openclaw/openclaw@5457462e62
This commit is contained in:
parent
6c3b443b28
commit
b3ee5ce065
@ -1,15 +1,15 @@
|
||||
{
|
||||
"repository": "openclaw/openclaw",
|
||||
"sha": "30e079dd89b451ca22cb360ca887bd9367cc7939",
|
||||
"sha": "5457462e62670839b1b7d793e22f7f38a76b8b0c",
|
||||
"sources": {
|
||||
"openclaw": {
|
||||
"repository": "openclaw/openclaw",
|
||||
"sha": "30e079dd89b451ca22cb360ca887bd9367cc7939"
|
||||
"sha": "5457462e62670839b1b7d793e22f7f38a76b8b0c"
|
||||
},
|
||||
"clawhub": {
|
||||
"repository": "openclaw/clawhub",
|
||||
"sha": "38c21345906ab1f107a91b33bb86b63667d96643"
|
||||
}
|
||||
},
|
||||
"syncedAt": "2026-05-08T13:03:40.138Z"
|
||||
"syncedAt": "2026-05-08T13:17:34.583Z"
|
||||
}
|
||||
|
||||
@ -1172,6 +1172,7 @@ Auto-join example:
|
||||
discord: {
|
||||
voice: {
|
||||
enabled: true,
|
||||
mode: "stt-tts",
|
||||
model: "openai/gpt-5.4-mini",
|
||||
autoJoin: [
|
||||
{
|
||||
@ -1199,8 +1200,10 @@ Auto-join example:
|
||||
Notes:
|
||||
|
||||
- `voice.tts` overrides `messages.tts` for voice playback only.
|
||||
- `voice.model` overrides the LLM used for Discord voice channel responses only. Leave it unset to inherit the routed agent model. Do not set this to `gpt-realtime-2`; Discord voice channels use STT plus TTS playback, not the OpenAI Realtime session transport.
|
||||
- STT uses `tools.media.audio`; `voice.model` does not affect transcription.
|
||||
- `voice.mode` controls the conversation path: `stt-tts` keeps the existing batch STT plus TTS flow, `talk-buffer` uses a realtime voice shell for turn timing/transcription/playback while the OpenClaw agent produces the answer, and `bidi` lets the realtime model converse directly while exposing `openclaw_agent_consult` for the OpenClaw brain.
|
||||
- `voice.model` overrides the OpenClaw agent brain for Discord voice responses and realtime consults. Leave it unset to inherit the routed agent model. It is separate from `voice.realtime.model`.
|
||||
- In `stt-tts` mode, STT uses `tools.media.audio`; `voice.model` does not affect transcription.
|
||||
- In realtime modes, `voice.realtime.provider`, `voice.realtime.model`, and `voice.realtime.voice` configure the realtime audio session. For OpenAI Realtime 2 plus the Codex brain, use `voice.realtime.model: "gpt-realtime-2"` and `voice.model: "openai-codex/gpt-5.5"`.
|
||||
- For an OpenAI voice on Discord playback, set `voice.tts.provider: "openai"` and choose a Text-to-speech voice under `voice.tts.openai.voice` or `voice.tts.providers.openai.voice`. `cedar` is a good masculine-sounding choice on the current OpenAI TTS model.
|
||||
- Per-channel Discord `systemPrompt` overrides apply to voice transcript turns for that voice channel.
|
||||
- Voice transcript turns derive owner status from Discord `allowFrom` (or `dm.allowFrom`); non-owner speakers cannot access owner-only tools (for example `gateway` and `cron`).
|
||||
@ -1211,7 +1214,7 @@ Notes:
|
||||
- `@discordjs/voice` defaults are `daveEncryption=true` and `decryptionFailureTolerance=24` if unset.
|
||||
- `voice.connectTimeoutMs` controls the initial `@discordjs/voice` Ready wait for `/vc join` and auto-join attempts. Default: `30000`.
|
||||
- `voice.reconnectGraceMs` controls how long OpenClaw waits for a disconnected voice session to begin reconnecting before destroying it. Default: `15000`.
|
||||
- Voice playback does not stop just because another user starts speaking. To avoid feedback loops, OpenClaw ignores new voice capture while TTS is playing; speak after playback finishes for the next turn.
|
||||
- In `stt-tts` mode, voice playback does not stop just because another user starts speaking. To avoid feedback loops, OpenClaw ignores new voice capture while TTS is playing; speak after playback finishes for the next turn. Realtime modes forward speaker starts as barge-in signals to the realtime provider.
|
||||
- `voice.captureSilenceGraceMs` controls how long OpenClaw waits after Discord reports a speaker has stopped before finalizing that audio segment for STT. Default: `2500`; raise this if Discord splits normal pauses into choppy partial transcripts.
|
||||
- When ElevenLabs is the selected TTS provider, Discord voice playback uses streaming TTS and starts from the provider response stream. Providers without streaming support fall back to the synthesized temp-file path.
|
||||
- OpenClaw also watches receive decrypt failures and auto-recovers by leaving/rejoining the voice channel after repeated failures in a short window.
|
||||
@ -1219,7 +1222,7 @@ Notes:
|
||||
- `The operation was aborted` receive events are expected when OpenClaw finalizes a captured speaker segment; they are verbose diagnostics, not warnings.
|
||||
- Verbose Discord voice logs include a bounded one-line STT transcript preview for each accepted speaker segment, so debugging shows both the user side and the agent reply side without dumping unbounded transcript text.
|
||||
|
||||
Voice channel pipeline:
|
||||
STT plus TTS pipeline:
|
||||
|
||||
- Discord PCM capture is converted to a WAV temp file.
|
||||
- `tools.media.audio` handles STT, for example `openai/gpt-4o-mini-transcribe`.
|
||||
@ -1227,7 +1230,51 @@ Voice channel pipeline:
|
||||
- `voice.model`, when set, overrides only the response LLM for this voice-channel turn.
|
||||
- `voice.tts` is merged over `messages.tts`; streaming-capable providers feed the player directly, otherwise the resulting audio file is played in the joined channel.
|
||||
|
||||
Credentials are resolved per component: LLM route auth for `voice.model`, STT auth for `tools.media.audio`, and TTS auth for `messages.tts`/`voice.tts`.
|
||||
Realtime talk-buffer example:
|
||||
|
||||
```json5
|
||||
{
|
||||
channels: {
|
||||
discord: {
|
||||
voice: {
|
||||
enabled: true,
|
||||
mode: "talk-buffer",
|
||||
model: "openai-codex/gpt-5.5",
|
||||
realtime: {
|
||||
provider: "openai",
|
||||
model: "gpt-realtime-2",
|
||||
voice: "cedar",
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
Realtime bidi example:
|
||||
|
||||
```json5
|
||||
{
|
||||
channels: {
|
||||
discord: {
|
||||
voice: {
|
||||
enabled: true,
|
||||
mode: "bidi",
|
||||
model: "openai-codex/gpt-5.5",
|
||||
realtime: {
|
||||
provider: "openai",
|
||||
model: "gpt-realtime-2",
|
||||
voice: "cedar",
|
||||
toolPolicy: "safe-read-only",
|
||||
consultPolicy: "always",
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
Credentials are resolved per component: LLM route auth for `voice.model`, STT auth for `tools.media.audio`, TTS auth for `messages.tts`/`voice.tts`, and realtime provider auth for `voice.realtime.providers` or the provider's normal auth config.
|
||||
|
||||
### Voice messages
|
||||
|
||||
|
||||
Loading…
Reference in New Issue
Block a user