A practical field guide
Hermes Agent, OpenClaw, OpenCode, and Claude Code all happily talk to a local Ollama model. No API keys, no per-token billing, no data leaving the box. This site is the working notes I wish I had on day one.
The stack
Nous Research's agent runtime. Multi-channel gateway (Telegram, Slack, Discord, iMessage), persistent memory, skills system, cron jobs. ~14k GitHub stars, very active.
Peter Steinberger's open-source personal assistant. Runs locally, talks to 35+ LLM providers including Ollama, plugs into the same chat apps. The "AI that actually does things."
The local LLM runtime. One ollama pull and you have a chat-capable model on your Mac. Hermes and OpenClaw both speak Ollama's OpenAI-compatible API natively.
The coding tier
Hermes and OpenClaw are great for chat, scheduling, and ad-hoc tasks. For actual code work — multi-file refactors, PR review, test generation — hand off to a coding tool that speaks Ollama.
Terminal-first coding agent, open source, model-agnostic. Point it at your local Ollama endpoint and it acts like a junior pair programmer that lives in your shell.
Anthropic's coding CLI. Designed for Claude, but with the right provider config it will run against any OpenAI-compatible endpoint — including your local Ollama server.
Why bother
ollama pull model:tag.What you'll need
16GB unified memory is the practical floor for 14B. 32B+ needs 32GB+ or you'll be paging.
Hermes and OpenClaw are Node. OpenCode is Go binary. Claude Code is Node. Ollama is a single binary.
brew install ollama on macOS, then ollama serve in one terminal. The default endpoint is http://127.0.0.1:11434/v1.
ollama pull qwen2.5-coder:14b for coding, ollama pull llama3.1:8b for chat. See the Ollama page for full picks.