A small, focused CLI for managing local LLM servers. Define your models once in
TOML, then start, stop, restart, inspect status, tail logs, or drop
into a live tui dashboard — without remembering which llama-server flags you
used last week.
$ modelctl status
╭─────────────────┬───────────┬─────────────┬─────────┬───────┬───────────────────────╮
│ name │ runtime │ location │ status │ pid │ endpoint │
├─────────────────┼───────────┼─────────────┼─────────┼───────┼───────────────────────┤
│ gemma-4-26b-a4b │ llama_cpp │ local │ stopped │ - │ http://127.0.0.1:8001 │
│ gemma-4-31b │ llama_cpp │ local │ running │ 86515 │ http://127.0.0.1:8002 │
│ shadow-gpt-oss │ llama_cpp │ ssh:shadow │ running │ 33195 │ http://127.0.0.1:8003 │
╰─────────────────┴───────────┴─────────────┴─────────┴───────┴───────────────────────╯
modelctl is a personal launcher, not a replacement for Ollama or LM Studio.
It stays out of the way: one config file, one PID per model, one log file per
model, one binary. Write-once configuration, reproducible launches, and an
opinionated but quiet TUI.
- Runtime-agnostic core,
llama.cppadapter first. TheRuntimetrait makes it straightforward to addmlx-lm,vllm, or any forkedllama-server(e.g. PrismML's 1-bit Bonsai fork) as a new adapter. - Detached processes with PID and log tracking. Servers keep running after you close the shell. Stale PID files are detected and cleaned automatically.
- OpenAI-compatible API out of the box. Because the first runtime is
llama-server, every model you launch speaks/v1/chat/completionsat the configured host and port. - Live
tuidashboard. Ratatui-based, feature-gated, read-only by design. Shows configured models, live status, and a tailed log pane for the selected row. Colors align with macmon — terminal-nativeColor::Green, rounded borders, no forced backgrounds. - System metrics pane (Apple Silicon). When built with the default
metricsfeature on an Apple Silicon Mac, the TUI shows live sparkline charts for CPU %, GPU %, RAM usage, and power draw via macmon. Useful for watching how a model load affects your machine without switching terminals. - Shell-expanded config paths.
~/llama.cpp/build/bin/llama-serverworks as-is; no need to hardcode absolute paths. - Optional remote-host mode via ssh. Add
ssh_host = "alias-or-user@host"to a[models.<name>]block andmodelctlmanages that model on the remote machine (start/stop/restart/status/logs) over ssh, without duplicating config between hosts. Uses key-based auth withBatchMode=yes; remote logs land at/tmp/modelctl-<name>.logon the target host and are streamed locally viamodelctl logs -f. - Single ~2 MB release binary. No Python interpreter, no virtualenv, no runtime dependencies beyond libc.
git clone https://github.com/wrale/modelctl.git
cd modelctl
cargo install --path .Release build lands at ~/.cargo/bin/modelctl (~2 MB).
- Rust 1.80+ for the
cargo installpath - macOS or Linux (tested on macOS; Linux should work — the
dirscrate resolves state and config paths correctly on both) - A configured runtime binary —
modelctllaunches external servers. For thellama_cppadapter, a workingllama-servermust exist at the path you pointbinaryto in the config.
# Create a starter config with a placeholder Gemma 4 entry
modelctl config init
# Edit it to match your local model files
$EDITOR "$(modelctl config path)"
# Launch
modelctl start gemma-4-31b
modelctl status
modelctl logs gemma-4-31b -f
# Live dashboard
modelctl tui
# Stop
modelctl stop gemma-4-31bmodelctl reads its config from the platform-native config directory:
| Platform | Path |
|---|---|
| macOS | ~/Library/Application Support/modelctl/models.toml |
| Linux | ~/.config/modelctl/models.toml |
Run modelctl config path to print the resolved location.
[models.gemma-4-26b-a4b]
runtime = "llama_cpp"
binary = "~/llama.cpp/build/bin/llama-server"
model = "~/models/gemma-4-26B-A4B-it-UD-Q4_K_XL.gguf"
host = "127.0.0.1"
port = 8001
extra_args = [
"--temp", "1.0",
"--top-p", "0.95",
"--top-k", "64",
"--reasoning", "on",
"-ngl", "99",
]
[models.gemma-4-31b]
runtime = "llama_cpp"
binary = "~/llama.cpp/build/bin/llama-server"
model = "~/models/gemma-4-31B-it-UD-Q4_K_XL.gguf"
host = "127.0.0.1"
port = 8002
extra_args = [
"--temp", "1.0",
"--top-p", "0.95",
"--top-k", "64",
"--reasoning", "on",
"-ngl", "99",
"--ctx-size", "8192",
"--cache-type-k", "q8_0",
"--cache-type-v", "q8_0",
]| Field | Required | Description |
|---|---|---|
runtime |
yes | Adapter name. Currently llama_cpp. |
binary |
yes | Path to the server executable. ~ is expanded locally; for ssh_host entries the path is on the remote host and not expanded locally. |
model |
yes | Path to the model file (e.g. a GGUF). Same local-vs-remote semantics as binary. |
host |
no | Bind host. Passed as --host if set. |
port |
no | Bind port. Passed as --port if set. |
extra_args |
no | Array of additional flags passed through verbatim. |
ssh_host |
no | If set, the model is managed on a remote host via ssh (alias from ~/.ssh/config or user@host). See "Remote hosts" below. |
Add an ssh_host field to any [models.<name>] block to have modelctl
manage that model on a remote machine over ssh instead of locally:
[models.remote-gpt-oss-20b]
runtime = "llama_cpp"
ssh_host = "shadowfax" # alias from ~/.ssh/config, or user@host
binary = "/home/josh/src/llama.cpp/build-cuda12/bin/llama-server"
model = "/home/josh/models/gpt-oss-20b-F16.gguf"
host = "127.0.0.1"
port = 8003
extra_args = [
"--jinja",
"-ngl", "99",
"--ctx-size", "32768",
"--cache-type-k", "q8_0",
"--cache-type-v", "q8_0",
"--parallel", "1",
]Semantics:
binaryandmodelare paths on the remote filesystem. They are not expanded or checked locally.startspawns the server on the remote host viassh <host> 'setsid ... & echo $!', captures the remote PID, and stores it in the local state dir so subsequentstop/status/restartoperations know what to target.stoprunsssh <host> 'kill <pid>', polls viakill -0, and escalates tokill -9after ~6 seconds if needed. Same timing as the local path.statuschecks liveness withssh <host> 'kill -0 <pid>'under a short connect timeout so unreachable hosts fail fast instead of hanging the table.logsstreamsssh <host> 'tail -F /tmp/modelctl-<name>.log'for follow mode, orcatfor a one-shot read. Remote logs live at a predictable path on the target host.restartsequences the remote stop and remote start correctly.
Requirements for the ssh_host path:
- Key-based ssh auth to the target host —
modelctlinvokes ssh withBatchMode=yes, so password prompts will fail fast. - An ssh alias or user@host that resolves to your target. If you use an
alias from
~/.ssh/config, anyLocalForward,ProxyJump, orIdentityFilesettings there are honored. - Writable
/tmpon the remote host for the log file.
Remote entries show up in modelctl status with ssh:<host> in the
location column so it's obvious which machine each model lives on. Local
entries show local there. The endpoint column shows the service's
listen address, which for remote entries is on the remote host's loopback
and not directly reachable from your machine — reach it from the local
side via an ssh LocalForward or by invoking modelctl logs / modelctl status which tunnel through ssh themselves.
Only the llama_cpp runtime supports remote mode today. Other runtimes
work locally as always but will error on start if given an ssh_host.
| Command | Behavior |
|---|---|
modelctl start <name> |
Spawn the configured server detached. Writes PID + log. |
modelctl stop <name> |
SIGTERM, falls back to SIGKILL after ~6s. |
modelctl restart <name> |
Stop (if running) then start. |
modelctl status / ls |
Table of all configured models with live status. |
modelctl logs <name> [-f] |
Print or tail the log file. |
modelctl config path |
Print the config file path. |
modelctl config init |
Write a starter config if none exists. |
modelctl tui |
Launch the live dashboard (feature-gated). |
modelctl about |
Print license and third-party attribution. |
Start and stop are idempotent: starting a running model errors with its PID; stopping a stale PID clears the file.
Feature-gated behind two default cargo features: tui (ratatui + crossterm)
and metrics (macmon, Apple Silicon only). To build without the dashboard
and metrics pane (smaller binary, broader platform support):
cargo install --path . --no-default-features modelctl v0.1.0 0 running 3 configured
╭ models ──────────────────────────────────────────────────────────────────────────────────╮
│ name runtime status pid endpoint │
│ › ○ bonsai-8b llama_cpp stopped — http://127.0.0.1:8005 │
│ ○ gemma-4-26b-a4b llama_cpp stopped — http://127.0.0.1:8001 │
│ ○ gemma-4-31b llama_cpp stopped — http://127.0.0.1:8002 │
╰──────────────────────────────────────────────────────────────────────────────────────────╯
╭ log bonsai-8b ───────────────────────────────────────────────────────────────────────────╮
│slot update_slots: id 2 | task 432 | n_tokens = 276, memory_seq_rm [276, end) │
│slot init_sampler: id 2 | task 432 | init sampler, took 0.04 ms, tokens: text = 277, │
│total = 277 │
│slot update_slots: id 2 | task 432 | prompt processing done, n_tokens = 277, │
│batch.n_tokens = 4 │
│srv params_from_: Chat format: Hermes 2 Pro │
│slot print_timing: id 1 | task 383 | │
╰──────────────────────────────────────────────────────────────────────────────────────────╯
╭ CPU 3% ───── 41°C ╮╭ GPU 1% ───── 38°C ╮╭ RAM 15.1/32G ── 47% ╮╭ PWR 0.1W ───────────╮
│ ││ ││ ││ │
│ ││ ││ ││ │
│ ││ ││▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂││ │
│ ││ ││█████████████████████││ │
│▁▁▂▂▁▁▂▁▁▂ ▁▁▁▂▃▂▂▂▂▁││ ▁ ▂ ▁▃▁▁ ││█████████████████████││▁▁▁▁▁▁▁▁ ▂ ▁ ▁▁▂▁▃▃▃ │
╰─────────────────────╯╰─────────────────────╯╰─────────────────────╯╰─────────────────────╯
↑↓/jk select g/G top/bottom q/Esc quit use CLI for start/stop
The metrics pane appears only on Apple Silicon Macs (the metrics feature
uses macmon's IOKit bindings, which require aarch64-apple-darwin). On
Intel Macs and Linux the TUI renders without it.
| Key | Action |
|---|---|
↑/↓, j/k |
Select model |
g/Home |
Jump to first row |
G/End |
Jump to last row |
q/Esc |
Quit |
Log lines are colorized by heuristic keyword match:
| Keyword | Rendering |
|---|---|
error / fatal / panic |
reversed accent (alarm) |
warn |
bold accent |
listening / loaded / ready |
accent |
slot / init / srv |
dimmed |
| everything else | terminal default |
modelctl/
├── Cargo.toml
├── src/
│ ├── main.rs # clap dispatch
│ ├── config.rs # TOML loading, shell expansion, starter template
│ ├── state.rs # PID file, log file, process liveness via nix
│ ├── runtime/
│ │ ├── mod.rs # `trait Runtime { build_command }`
│ │ └── llama_cpp.rs # first adapter
│ ├── cmd/ # start, stop, restart, status, logs, config, about
│ ├── tui.rs # ratatui dashboard (feature "tui")
│ └── metrics.rs # macmon system metrics (feature "metrics")
└── .gitignore
- Create
src/runtime/<name>.rsimplementingRuntime::build_command. - Register it in
runtime::for_name. - Reference it as
runtime = "<name>"inmodels.toml.
That's the entire contract. The top-level start/stop/restart code paths
are runtime-agnostic — they just spawn whatever Command the runtime returns.
# Format + lint + test + build
cargo fmt --all
cargo clippy --all-targets -- -D warnings
cargo build --release
# Refresh third-party license bundle (required before committing any dep changes)
cargo install cargo-about # one-time
cargo about generate about.hbs > THIRD_PARTY_LICENSES.mdTHIRD_PARTY_LICENSES.md is generated from Cargo.lock by
cargo-about and bundled into
the binary via include_str! so modelctl about prints everything at
runtime. Regenerate it whenever dependencies change.
Licensed under either of
- Apache License 2.0, (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT license, (LICENSE-MIT or http://opensource.org/licenses/MIT)
at your option.
Third-party crates bundled into the release binary are listed in
THIRD_PARTY_LICENSES.md. The same content is
available at runtime with modelctl about.
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in modelctl by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.
Copyright © 2026 Wrale Ltd.