The model layer
One binary, one pull command, and you have a chat-capable model serving on http://127.0.0.1:11434. The same API works for Hermes, OpenClaw, OpenCode, and Claude Code. No GPU required if you have an M-series Mac.
Install
# macOS
brew install ollama
ollama serve
# Linux (one-liner)
curl -fsSL https://ollama.com/install.sh | sh
ollama serve
# Windows: download from ollama.com/download
Leave ollama serve running in a terminal (or install it as a launchd service — see below). Default endpoint: http://127.0.0.1:11434. It exposes an OpenAI-compatible API at /v1, which is what every tool on this site uses.
brew services start ollama makes it auto-start on login and survive terminal closes. Recommended.
Model picks
There are hundreds of models on ollama.com/library. These four cover ~90% of what you'll actually want:
Meta's Llama 3.1 8B. The safe default. ~5GB RAM. Good at conversation, decent at reasoning, fast on M-series. Use this if you're not sure.
ollama pull llama3.1:8b
Alibaba's Qwen 2.5 Coder. ~9GB RAM. Currently the best open coding model under 32B. Use it for OpenCode and Claude Code.
ollama pull qwen2.5-coder:14b
DeepSeek's distilled R1. ~5GB RAM. Surfaces its chain-of-thought, which is great for debugging why an agent made a choice.
ollama pull deepseek-r1:8b
Multimodal — takes images as input. ~8GB RAM. Useful if you want to drop a screenshot into a chat with your agent.
ollama pull llava:13b
Memory budget
Rule of thumb for M-series Macs (unified memory): the model size in parameters × 1.2 ≈ RAM usage in GB. So 14B ≈ 17GB. Leave 2–4GB for the OS and you're at the 16GB minimum for a 14B model.
Stick to 7B–8B models. llama3.1:8b, qwen2.5-coder:7b, phi3:mini. Comfortable, fast.
14B is the sweet spot. qwen2.5-coder:14b works, deepseek-r1:14b works. Still leaves headroom for Chrome.
32B opens up. qwen2.5-coder:32b, llama3.1:70b (Q4) on the 64GB+ tier. Frontier-ish on a laptop.
Custom Modelfile
When the default model isn't quite right, drop a Modelfile in your project and ollama create a custom variant:
FROM qwen2.5-coder:14b
SYSTEM """You are a senior pair programmer.
Default to TypeScript. Prefer functional, immutable code.
Always explain non-obvious choices. Never invent APIs."""
PARAMETER temperature 0.2
PARAMETER num_ctx 8192
PARAMETER stop "<|im_end|>"
# Build the custom model
ollama create coder-mike -f Modelfile
# Use it in Hermes / OpenClaw / OpenCode
ollama run coder-mike "refactor this to async/await"
Smoke test
# Hit Ollama directly with curl
curl http://127.0.0.1:11434/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "qwen2.5-coder:14b",
"messages": [{"role": "user", "content": "Say hello in one word."}]
}'
If you get a JSON response with a choices array, every tool on this site will work against your Ollama. Time to wire it up to Hermes or OpenClaw, or jump straight to OpenCode / Claude Code.