Skip to content

wrale/modelctl

Repository files navigation

modelctl

A small, focused CLI for managing local LLM servers. Define your models once in TOML, then start, stop, restart, inspect status, tail logs, or drop into a live tui dashboard — without remembering which llama-server flags you used last week.

$ modelctl status
╭─────────────────┬───────────┬─────────────┬─────────┬───────┬───────────────────────╮
│ name            │ runtime   │ location    │ status  │ pid   │ endpoint              │
├─────────────────┼───────────┼─────────────┼─────────┼───────┼───────────────────────┤
│ gemma-4-26b-a4b │ llama_cpp │ local       │ stopped │ -     │ http://127.0.0.1:8001 │
│ gemma-4-31b     │ llama_cpp │ local       │ running │ 86515 │ http://127.0.0.1:8002 │
│ shadow-gpt-oss  │ llama_cpp │ ssh:shadow  │ running │ 33195 │ http://127.0.0.1:8003 │
╰─────────────────┴───────────┴─────────────┴─────────┴───────┴───────────────────────╯

modelctl is a personal launcher, not a replacement for Ollama or LM Studio. It stays out of the way: one config file, one PID per model, one log file per model, one binary. Write-once configuration, reproducible launches, and an opinionated but quiet TUI.

Features

  • Runtime-agnostic core, llama.cpp adapter first. The Runtime trait makes it straightforward to add mlx-lm, vllm, or any forked llama-server (e.g. PrismML's 1-bit Bonsai fork) as a new adapter.
  • Detached processes with PID and log tracking. Servers keep running after you close the shell. Stale PID files are detected and cleaned automatically.
  • OpenAI-compatible API out of the box. Because the first runtime is llama-server, every model you launch speaks /v1/chat/completions at the configured host and port.
  • Live tui dashboard. Ratatui-based, feature-gated, read-only by design. Shows configured models, live status, and a tailed log pane for the selected row. Colors align with macmon — terminal-native Color::Green, rounded borders, no forced backgrounds.
  • System metrics pane (Apple Silicon). When built with the default metrics feature on an Apple Silicon Mac, the TUI shows live sparkline charts for CPU %, GPU %, RAM usage, and power draw via macmon. Useful for watching how a model load affects your machine without switching terminals.
  • Shell-expanded config paths. ~/llama.cpp/build/bin/llama-server works as-is; no need to hardcode absolute paths.
  • Optional remote-host mode via ssh. Add ssh_host = "alias-or-user@host" to a [models.<name>] block and modelctl manages that model on the remote machine (start/stop/restart/status/logs) over ssh, without duplicating config between hosts. Uses key-based auth with BatchMode=yes; remote logs land at /tmp/modelctl-<name>.log on the target host and are streamed locally via modelctl logs -f.
  • Single ~2 MB release binary. No Python interpreter, no virtualenv, no runtime dependencies beyond libc.

Installation

From source

git clone https://github.com/wrale/modelctl.git
cd modelctl
cargo install --path .

Release build lands at ~/.cargo/bin/modelctl (~2 MB).

Requirements

  • Rust 1.80+ for the cargo install path
  • macOS or Linux (tested on macOS; Linux should work — the dirs crate resolves state and config paths correctly on both)
  • A configured runtime binarymodelctl launches external servers. For the llama_cpp adapter, a working llama-server must exist at the path you point binary to in the config.

Quick Start

# Create a starter config with a placeholder Gemma 4 entry
modelctl config init

# Edit it to match your local model files
$EDITOR "$(modelctl config path)"

# Launch
modelctl start gemma-4-31b
modelctl status
modelctl logs gemma-4-31b -f

# Live dashboard
modelctl tui

# Stop
modelctl stop gemma-4-31b

Configuration

modelctl reads its config from the platform-native config directory:

Platform Path
macOS ~/Library/Application Support/modelctl/models.toml
Linux ~/.config/modelctl/models.toml

Run modelctl config path to print the resolved location.

Example models.toml

[models.gemma-4-26b-a4b]
runtime = "llama_cpp"
binary = "~/llama.cpp/build/bin/llama-server"
model = "~/models/gemma-4-26B-A4B-it-UD-Q4_K_XL.gguf"
host = "127.0.0.1"
port = 8001
extra_args = [
    "--temp", "1.0",
    "--top-p", "0.95",
    "--top-k", "64",
    "--reasoning", "on",
    "-ngl", "99",
]

[models.gemma-4-31b]
runtime = "llama_cpp"
binary = "~/llama.cpp/build/bin/llama-server"
model = "~/models/gemma-4-31B-it-UD-Q4_K_XL.gguf"
host = "127.0.0.1"
port = 8002
extra_args = [
    "--temp", "1.0",
    "--top-p", "0.95",
    "--top-k", "64",
    "--reasoning", "on",
    "-ngl", "99",
    "--ctx-size", "8192",
    "--cache-type-k", "q8_0",
    "--cache-type-v", "q8_0",
]

Field reference

Field Required Description
runtime yes Adapter name. Currently llama_cpp.
binary yes Path to the server executable. ~ is expanded locally; for ssh_host entries the path is on the remote host and not expanded locally.
model yes Path to the model file (e.g. a GGUF). Same local-vs-remote semantics as binary.
host no Bind host. Passed as --host if set.
port no Bind port. Passed as --port if set.
extra_args no Array of additional flags passed through verbatim.
ssh_host no If set, the model is managed on a remote host via ssh (alias from ~/.ssh/config or user@host). See "Remote hosts" below.

Remote hosts

Add an ssh_host field to any [models.<name>] block to have modelctl manage that model on a remote machine over ssh instead of locally:

[models.remote-gpt-oss-20b]
runtime = "llama_cpp"
ssh_host = "shadowfax"  # alias from ~/.ssh/config, or user@host
binary = "/home/josh/src/llama.cpp/build-cuda12/bin/llama-server"
model = "/home/josh/models/gpt-oss-20b-F16.gguf"
host = "127.0.0.1"
port = 8003
extra_args = [
    "--jinja",
    "-ngl", "99",
    "--ctx-size", "32768",
    "--cache-type-k", "q8_0",
    "--cache-type-v", "q8_0",
    "--parallel", "1",
]

Semantics:

  • binary and model are paths on the remote filesystem. They are not expanded or checked locally.
  • start spawns the server on the remote host via ssh <host> 'setsid ... & echo $!', captures the remote PID, and stores it in the local state dir so subsequent stop / status / restart operations know what to target.
  • stop runs ssh <host> 'kill <pid>', polls via kill -0, and escalates to kill -9 after ~6 seconds if needed. Same timing as the local path.
  • status checks liveness with ssh <host> 'kill -0 <pid>' under a short connect timeout so unreachable hosts fail fast instead of hanging the table.
  • logs streams ssh <host> 'tail -F /tmp/modelctl-<name>.log' for follow mode, or cat for a one-shot read. Remote logs live at a predictable path on the target host.
  • restart sequences the remote stop and remote start correctly.

Requirements for the ssh_host path:

  • Key-based ssh auth to the target host — modelctl invokes ssh with BatchMode=yes, so password prompts will fail fast.
  • An ssh alias or user@host that resolves to your target. If you use an alias from ~/.ssh/config, any LocalForward, ProxyJump, or IdentityFile settings there are honored.
  • Writable /tmp on the remote host for the log file.

Remote entries show up in modelctl status with ssh:<host> in the location column so it's obvious which machine each model lives on. Local entries show local there. The endpoint column shows the service's listen address, which for remote entries is on the remote host's loopback and not directly reachable from your machine — reach it from the local side via an ssh LocalForward or by invoking modelctl logs / modelctl status which tunnel through ssh themselves.

Only the llama_cpp runtime supports remote mode today. Other runtimes work locally as always but will error on start if given an ssh_host.

Commands

Command Behavior
modelctl start <name> Spawn the configured server detached. Writes PID + log.
modelctl stop <name> SIGTERM, falls back to SIGKILL after ~6s.
modelctl restart <name> Stop (if running) then start.
modelctl status / ls Table of all configured models with live status.
modelctl logs <name> [-f] Print or tail the log file.
modelctl config path Print the config file path.
modelctl config init Write a starter config if none exists.
modelctl tui Launch the live dashboard (feature-gated).
modelctl about Print license and third-party attribution.

Start and stop are idempotent: starting a running model errors with its PID; stopping a stale PID clears the file.

TUI

Feature-gated behind two default cargo features: tui (ratatui + crossterm) and metrics (macmon, Apple Silicon only). To build without the dashboard and metrics pane (smaller binary, broader platform support):

cargo install --path . --no-default-features

Layout

 modelctl v0.1.0                                                    0 running  3 configured
╭ models ──────────────────────────────────────────────────────────────────────────────────╮
│       name                 runtime      status    pid      endpoint                      │
│ › ○   bonsai-8b            llama_cpp    stopped   —        http://127.0.0.1:8005         │
│   ○   gemma-4-26b-a4b      llama_cpp    stopped   —        http://127.0.0.1:8001         │
│   ○   gemma-4-31b          llama_cpp    stopped   —        http://127.0.0.1:8002         │
╰──────────────────────────────────────────────────────────────────────────────────────────╯
╭ log bonsai-8b ───────────────────────────────────────────────────────────────────────────╮
│slot update_slots: id  2 | task 432 | n_tokens = 276, memory_seq_rm [276, end)            │
│slot init_sampler: id  2 | task 432 | init sampler, took 0.04 ms, tokens: text = 277,     │
│total = 277                                                                               │
│slot update_slots: id  2 | task 432 | prompt processing done, n_tokens = 277,             │
│batch.n_tokens = 4                                                                        │
│srv  params_from_: Chat format: Hermes 2 Pro                                              │
│slot print_timing: id  1 | task 383 |                                                     │
╰──────────────────────────────────────────────────────────────────────────────────────────╯
╭ CPU   3% ───── 41°C ╮╭ GPU   1% ───── 38°C ╮╭ RAM 15.1/32G ── 47% ╮╭ PWR 0.1W ───────────╮
│                     ││                     ││                     ││                     │
│                     ││                     ││                     ││                     │
│                     ││                     ││▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂││                     │
│                     ││                     ││█████████████████████││                     │
│▁▁▂▂▁▁▂▁▁▂ ▁▁▁▂▃▂▂▂▂▁││        ▁ ▂     ▁▃▁▁ ││█████████████████████││▁▁▁▁▁▁▁▁ ▂ ▁ ▁▁▂▁▃▃▃ │
╰─────────────────────╯╰─────────────────────╯╰─────────────────────╯╰─────────────────────╯
 ↑↓/jk select  g/G top/bottom  q/Esc quit  use CLI for start/stop

The metrics pane appears only on Apple Silicon Macs (the metrics feature uses macmon's IOKit bindings, which require aarch64-apple-darwin). On Intel Macs and Linux the TUI renders without it.

Keybindings

Key Action
/, j/k Select model
g/Home Jump to first row
G/End Jump to last row
q/Esc Quit

Log highlighting

Log lines are colorized by heuristic keyword match:

Keyword Rendering
error / fatal / panic reversed accent (alarm)
warn bold accent
listening / loaded / ready accent
slot / init / srv dimmed
everything else terminal default

Architecture

modelctl/
├── Cargo.toml
├── src/
│   ├── main.rs              # clap dispatch
│   ├── config.rs            # TOML loading, shell expansion, starter template
│   ├── state.rs             # PID file, log file, process liveness via nix
│   ├── runtime/
│   │   ├── mod.rs           # `trait Runtime { build_command }`
│   │   └── llama_cpp.rs     # first adapter
│   ├── cmd/                 # start, stop, restart, status, logs, config, about
│   ├── tui.rs               # ratatui dashboard (feature "tui")
│   └── metrics.rs           # macmon system metrics (feature "metrics")
└── .gitignore

Adding a new runtime

  1. Create src/runtime/<name>.rs implementing Runtime::build_command.
  2. Register it in runtime::for_name.
  3. Reference it as runtime = "<name>" in models.toml.

That's the entire contract. The top-level start/stop/restart code paths are runtime-agnostic — they just spawn whatever Command the runtime returns.

Development

# Format + lint + test + build
cargo fmt --all
cargo clippy --all-targets -- -D warnings
cargo build --release

# Refresh third-party license bundle (required before committing any dep changes)
cargo install cargo-about  # one-time
cargo about generate about.hbs > THIRD_PARTY_LICENSES.md

THIRD_PARTY_LICENSES.md is generated from Cargo.lock by cargo-about and bundled into the binary via include_str! so modelctl about prints everything at runtime. Regenerate it whenever dependencies change.

License

Licensed under either of

at your option.

Third-party crates bundled into the release binary are listed in THIRD_PARTY_LICENSES.md. The same content is available at runtime with modelctl about.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in modelctl by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.


Copyright © 2026 Wrale Ltd.