diff --git a/.openclaw-sync/source.json b/.openclaw-sync/source.json
index 3076a90dd..2940dfdcb 100644
--- a/.openclaw-sync/source.json
+++ b/.openclaw-sync/source.json
@@ -1,5 +1,5 @@
{
"repository": "openclaw/openclaw",
- "sha": "d48c3e12a5d896cbdad3e7fcf5bba2af644a9ca5",
- "syncedAt": "2026-04-28T05:34:31.416Z"
+ "sha": "59162379627bc0a2c26af78df4823926dfa4f9fc",
+ "syncedAt": "2026-04-28T05:35:28.467Z"
}
diff --git a/docs/cli/onboard.md b/docs/cli/onboard.md
index e0c715034..bb15823dd 100644
--- a/docs/cli/onboard.md
+++ b/docs/cli/onboard.md
@@ -61,10 +61,12 @@ openclaw onboard --non-interactive \
--custom-model-id "foo-large" \
--custom-api-key "$CUSTOM_API_KEY" \
--secret-input-mode plaintext \
- --custom-compatibility openai
+ --custom-compatibility openai \
+ --custom-image-input
```
`--custom-api-key` is optional in non-interactive mode. If omitted, onboarding checks `CUSTOM_API_KEY`.
+OpenClaw marks common vision model IDs as image-capable automatically. Pass `--custom-image-input` for unknown custom vision IDs, or `--custom-text-input` to force text-only metadata.
LM Studio also supports a provider-specific key flag in non-interactive mode:
diff --git a/docs/gateway/config-tools.md b/docs/gateway/config-tools.md
index 979863b03..660656b01 100644
--- a/docs/gateway/config-tools.md
+++ b/docs/gateway/config-tools.md
@@ -456,6 +456,7 @@ OpenClaw uses the built-in model catalog. Add custom providers via `models.provi
- `models.providers.*.models`: explicit provider model catalog entries.
+ - `models.providers.*.models.*.input`: model input modalities. Use `["text"]` for text-only models and `["text", "image"]` for native image/vision models. Image attachments are only injected into agent turns when the selected model is marked image-capable.
- `models.providers.*.models.*.contextWindow`: native model context window metadata. This overrides provider-level `contextWindow` for that model.
- `models.providers.*.models.*.contextTokens`: optional runtime context cap. This overrides provider-level `contextTokens`; use it when you want a smaller effective context budget than the model's native `contextWindow`; `openclaw models list` shows both values when they differ.
- `models.providers.*.models.*.compat.supportsDeveloperRole`: optional compatibility hint. For `api: "openai-completions"` with a non-empty non-native `baseUrl` (host not `api.openai.com`), OpenClaw forces this to `false` at runtime. Empty/omitted `baseUrl` keeps default OpenAI behavior.
@@ -472,6 +473,8 @@ OpenClaw uses the built-in model catalog. Add custom providers via `models.provi
+Interactive custom-provider onboarding infers image input for common vision model IDs such as GPT-4o, Claude, Gemini, Qwen-VL, LLaVA, Pixtral, InternVL, Mllama, MiniCPM-V, and GLM-4V, and skips the extra question for known text-only families. Unknown model IDs still prompt for image support. Non-interactive onboarding uses the same inference; pass `--custom-image-input` to force image-capable metadata or `--custom-text-input` to force text-only metadata.
+
### Provider examples
diff --git a/docs/gateway/local-models.md b/docs/gateway/local-models.md
index a5480dd12..9eaf26860 100644
--- a/docs/gateway/local-models.md
+++ b/docs/gateway/local-models.md
@@ -168,6 +168,13 @@ catalog id and model ref:
- `models.providers.mlx.models[].id: "mlx-community/Qwen3-30B-A3B-6bit"`
- `agents.defaults.model.primary: "mlx/mlx-community/Qwen3-30B-A3B-6bit"`
+Set `input: ["text", "image"]` on local or proxied vision models so image
+attachments are injected into agent turns. Interactive custom-provider
+onboarding infers common vision model IDs and asks only for unknown names.
+Non-interactive onboarding uses the same inference; use `--custom-image-input`
+for unknown vision IDs or `--custom-text-input` when a known-looking model is
+text-only behind your endpoint.
+
Keep `models.mode: "merge"` so hosted models stay available as fallbacks.
Use `models.providers..timeoutSeconds` for slow local or remote model
servers before raising `agents.defaults.timeoutSeconds`. The provider timeout
diff --git a/docs/start/wizard-cli-automation.md b/docs/start/wizard-cli-automation.md
index 055fb1ef0..61b342eeb 100644
--- a/docs/start/wizard-cli-automation.md
+++ b/docs/start/wizard-cli-automation.md
@@ -166,11 +166,13 @@ openclaw onboard --non-interactive \
--custom-api-key "$CUSTOM_API_KEY" \
--custom-provider-id "my-custom" \
--custom-compatibility anthropic \
+ --custom-image-input \
--gateway-port 18789 \
--gateway-bind loopback
```
`--custom-api-key` is optional. If omitted, onboarding checks `CUSTOM_API_KEY`.
+ OpenClaw marks common vision model IDs as image-capable automatically. Add `--custom-image-input` for unknown custom vision IDs, or `--custom-text-input` to force text-only metadata.
Ref-mode variant:
@@ -184,6 +186,7 @@ openclaw onboard --non-interactive \
--secret-input-mode ref \
--custom-provider-id "my-custom" \
--custom-compatibility anthropic \
+ --custom-image-input \
--gateway-port 18789 \
--gateway-bind loopback
```
diff --git a/docs/start/wizard-cli-reference.md b/docs/start/wizard-cli-reference.md
index 7ff2efdb9..9ffca43ea 100644
--- a/docs/start/wizard-cli-reference.md
+++ b/docs/start/wizard-cli-reference.md
@@ -202,6 +202,7 @@ What you set:
- `--custom-api-key` (optional; falls back to `CUSTOM_API_KEY`)
- `--custom-provider-id` (optional)
- `--custom-compatibility ` (optional; default `openai`)
+ - `--custom-image-input` / `--custom-text-input` (optional; override inferred model input capability)
@@ -212,6 +213,7 @@ What you set:
Model behavior:
- Pick default model from detected options, or enter provider and model manually.
+- Custom-provider onboarding infers image support for common model IDs and asks only when the model name is unknown.
- When onboarding starts from a provider auth choice, the model picker prefers
that provider automatically. For Volcengine and BytePlus, the same preference
also matches their coding-plan variants (`volcengine-plan/*`,