Local AI MCP

Unified MCP server for managing local model runtimes (Ollama, LM Studio, and more): provider-agnostic discovery, lifecycle, hardware-fit, and delegated inference.

License npm stdio

What it is

An operations-first control plane for the models on your own machine. It discovers, inspects, fits, and manages local runtimes over their local HTTP APIs, exposing one consistent tool surface across them. It runs over stdio only and is a client to your runtimes — it never opens a network listener of its own.

Inference is delegation, not chat

The complete and embed tools delegate (offload) inference to a local model for cost control and privacy — keeping tokens and data on your hardware instead of a hosted API. They are inference primitives, not a conversational chat surface.

Provider-adapter model

Each runtime is an adapter behind a single Provider interface. Every tool takes an optional provider argument; omit it and the tool operates across all detected runtimes.

AdapterDefault hostNotes
Ollamahttp://localhost:11434Native REST + OpenAI-compatible /v1; load/unload via keep_alive.
LM Studiohttp://localhost:1234REST /api/v0 + OpenAI-compatible; lms CLI for load/unload/pull when present.

Tools (16)

list_providers

Runtimes, host, live status, capabilities.

list_models

Installed models across providers.

list_loaded

Models resident in memory.

model_info

Detailed model metadata.

pull_model heavy

Download a model (multiple GB).

remove_model destructive

Delete a model; requires confirm.

load_model

Load a model into memory.

unload_model

Evict a model from memory.

health_check

Liveness and version per provider.

system_resources

RAM, CPU, and GPU/VRAM.

fit_check

Does a model fit in VRAM or RAM?

benchmark heavy

Latency and tokens/sec.

search_available

Search a curated model catalog.

suggest_model

Recommend by task and hardware fit.

complete

Delegate a completion (offload).

embed

Delegate embedding generation.

Install

npx @tmhs/local-ai-mcp

Configure via OLLAMA_HOST, LMSTUDIO_HOST, LOCAL_AI_REQUEST_TIMEOUT_MS, and LOCAL_AI_DETECT_TIMEOUT_MS.