Connecting Agents to Local Ollama Models
How to route a specific OpenClaw agent to a local Ollama model without changing global defaults.
Why Route Individual Agents
Running a local model through Ollama saves API cost, keeps sensitive data on your network, and gives you control over latency. But you usually don't want every agent switched over — cloud models still win for complex reasoning. The right pattern is per-agent routing: keep your default provider intact, swap just the agents that benefit from a local model.
The Problem with 2026.3
OpenClaw 2026.3 tightened its config validation. Two things that used to work silently now fail:
- Standard CLI patterns like
agents.growth-agent.modeldon't resolve because agents live in an array, not a keyed object - Tool-calling protocols clash with distilled local models that don't implement the
toolsfield, producing400 - does not support tools
Both are solvable. The fix is three steps.
1. Register the Ollama Provider
OpenClaw needs to know the provider exists before any agent can reference it. Every model entry requires both id and name in 2026.3 — the schema will reject partial definitions.
openclaw config set models.providers.ollama '{
"baseUrl": "http://10.20.170.2:11434",
"apiKey": "ollama-local",
"api": "ollama",
"models": [
{"id": "glm-4.7-flash:latest", "name": "GLM 4.7 Flash"},
{"id": "qwen3.5:27b", "name": "Qwen 3.5 27B"}
]
}' --json
The apiKey is required by the schema but unused by Ollama — any non-empty string works.
If your Ollama host is on a different machine, start it with OLLAMA_HOST=0.0.0.0 ollama serve. By default Ollama only listens on localhost, which blocks remote connections.
2. Assign the Model to One Agent
Agents are stored as an array in the config, not a map. Target them by index.
Find the position of the agent you want to change:
openclaw agents list --verbose
Note the index of growth-agent in the output, then set the model on that slot:
# agents.list.5 = growth-agent's configuration slot
openclaw config set agents.list.5.model "ollama/glm-4.7-flash:latest"
The provider/model format is required — bare model IDs won't resolve.
Agent indexes shift when you add or remove agents. If you're scripting this, parse the output of agents list --json to find the index dynamically instead of hardcoding it.
3. Fix Tool Calling Errors
Distilled and smaller models often don't support the tools API. When OpenClaw sends a tool-enabled request, Ollama returns 400 - does not support tools and the agent fails silently.
Two ways to handle it:
- Switch to a tool-capable model —
glm-4.7-flash,qwen3.5:27binstruct variants, and most non-distilled instruction-tuned models handle tools correctly - Use an alias — if you're stuck with a specific base model, tag it under a name OpenClaw's protocol detection already handles:
ollama cp original-model-name qwen2.5:latest
The alias lets OpenClaw send the standard protocol without modifying the underlying weights.
Verification
Check the configuration took effect:
openclaw agents list --verbose | grep -A 5 "growth-agent"
Then restart the gateway and send a test prompt:
openclaw gateway --force
If the agent responds normally, the routing is working. If it hangs or errors, check the gateway logs for schema validation failures or tool-call rejections.
Performance Notes
- VRAM —
glm-4.7-flashneeds 19–21GB. Running on a 16GB card will either refuse to load or spill to CPU (slow). Monitor withnvidia-smiwhile the agent runs. - Network latency — local LAN (10.x.x.x) adds 2–5ms per token vs. loopback. Usually negligible, but on a multi-turn agent with long outputs it adds up. Keep Ollama and the gateway in the same subnet.
- First-token latency — Ollama keeps models loaded for 5 minutes by default. The first request after idle pays a load-time tax. If your agent is bursty, consider
OLLAMA_KEEP_ALIVE=-1to pin the model.
When Local Models Make Sense
Not every agent benefits. Use local models when:
- The agent processes sensitive data that shouldn't leave your network
- The task is narrow and a smaller model is good enough (summarization, classification, formatting)
- Cost per request matters more than latency or peak quality
Keep cloud models for agents doing complex reasoning, long-horizon planning, or tasks where model quality directly affects output value.