Using Claude Code with Docker Model Runner

Step‑by‑step wiki guide on what Claude Code and Docker Model Runner are, why they complement each other, and how to install, configure, and optimize them for secure, local AI development.

31 January 2026 by

Suraj Barman

What Is Claude Code?

Claude Code is a client application that provides an Anthropic‑compatible interface for interacting with large language models (LLMs). It is designed for developers who want to write, test, and run code‑centric prompts while keeping data and execution under their control.

What Is Docker Model Runner?

Docker Model Runner (DMR) is a Docker‑based service that hosts LLMs locally and exposes an Anthropic‑compatible API. By running models in containers, DMR offers isolation, reproducibility, and predictable cost without relying on external cloud endpoints.

Why Combine Claude Code with Docker Model Runner?

Data sovereignty: All prompts and responses stay on your hardware.
Cost predictability: No per‑token fees from third‑party APIs.
Performance control: Adjust CPU, GPU, and memory allocations per model.
Security: Containers isolate the model runtime from the host system.
Flexibility: Swap models or adjust context size without changing Claude Code.

How to Install Claude Code

Claude Code provides platform‑specific installers.

macOS / Linux: curl -fsSL | bash
Windows PowerShell: irm | iex

How to Set Up Docker Model Runner

1. Install Docker Engine (Docker Desktop or Docker Engine CLI).
2. Pull a model you wish to run, e.g., docker model pull gpt-oss.
3. (Optional) Repackage the model with a larger context size (see next section).
4. Start the Model Runner service:

For Docker Desktop users, enable TCP access: docker desktop enable model-runner --tcp
The service will listen at by default.

How to Increase Model Context Size

Some coding models benefit from extended context windows. Docker Model Runner can repackage any model with a custom token limit.

Pull the base model: docker model pull gpt-oss
Repackage with a larger context (e.g., 32 k tokens): docker model package --from ai/gpt-oss --context-size 32000 gpt-oss:32k
Run Claude Code against the new image by pointing to the same API endpoint.

How to Connect Claude Code to Docker Model Runner

Claude Code reads the ANTHROPIC_BASE_URL environment variable to determine the API endpoint.

Set the variable in your shell (temporary): export ANTHROPIC_BASE_URL=
For a permanent setup, add the export line to ~/.bashrc, ~/.zshrc, or the Windows equivalent.

After setting the variable, launch Claude Code as usual; it will route all requests to the local Docker Model Runner.

How to Monitor Requests Sent by Claude Code

Docker Model Runner includes a built‑in request logger.

Run: docker model requests --model gpt-oss:32k | jq .
The command streams raw JSON payloads, useful for debugging API compatibility or inspecting token usage.

Why Persist the Configuration?

Persisting ANTHROPIC_BASE_URL ensures every new terminal session automatically targets your local runner, eliminating the risk of accidentally sending data to a remote service.

Add the export line to your shell profile.
Verify with echo $ANTHROPIC_BASE_URL after opening a new terminal.

Best Practices and Additional Resources

Regularly update Docker Model Runner and model images to receive security patches.
Use Docker’s resource limits (CPU, memory, GPU) to prevent a runaway model from exhausting host resources.
Consider enabling TLS on the Model Runner API for added network security, especially if accessed from other machines.
Explore Docker’s MCP Catalog for pre‑packaged, hardened model images.
Read the companion posts: “OpenCode with Docker Model Runner for Private AI Coding” and “Docker Model Runner GA Announcement”.