AgentLabX

Open research-automation platform

A researcher defines a topic. A team of LLM-powered agents takes it end-to-end — reviewing literature, forming a plan, preparing data, running experiments with baselines and ablations, interpreting results, writing a report, and critiquing it — with a principal-investigator agent overseeing strategic decisions. Local-first, observable at every step, and reproducible by construction.

Quickstart · Vision · SRS · Plans

What AgentLabX is

A rewrite of an earlier prototype, focused on producing research that resembles real lab output rather than a prompt chain. Every stage is a swappable module with a formal input/output contract, every tool is a standards-compliant MCP server, every credential is per-user and encrypted, and every step is observable.

The end product is a reproducible research artifact — literature survey · research plan · executed experiments with baselines and ablations · interpretation · written report · peer review — that a peer can hand-check number-by-number.

The research pipeline

  ┌──────────────────┐   ┌──────────────────┐   ┌──────────────────┐
  │    DISCOVERY     │──>│  IMPLEMENTATION  │──>│     SYNTHESIS    │
  ├──────────────────┤   ├──────────────────┤   ├──────────────────┤
  │ literature_review│   │ data_exploration │   │  interpretation  │
  │ plan_formulation │   │ data_preparation │   │  report_writing  │
  │                  │   │  experimentation │   │   peer_review    │
  └──────────────────┘   └──────────────────┘   └──────────────────┘
           ▲                      ▲ │                     │
           │                      │ │                     │
           └──────────────────────┴─┴────── backtrack ────┘
                      (any stage may target any earlier stage;
                       stage-owned policy, partial rollback preserves
                       accepted artifacts; PI consulted only at
                       final-decision / clarification gates)

Each stage declares its input contract, output contract, required tools, and completion criteria. Multiple implementations of each stage can coexist behind the same contract — a simple baseline, a curated multi-turn version, a wrapper around a reference implementation — so users can compare method quality head-to-head on the same task.

Stages advance forward by default; if a stage uncovers evidence that invalidates earlier work it can request a backtrack with partial rollback that preserves accepted artifacts (literature entries, data splits, metrics). The PI agent — configured by the user — is consulted sparingly, only at final-decision gates or when a stage hits something only a human user can resolve (e.g., a missing API key).

Core principles

🔁 Reproducibility

Every experiment records seed, environment hash, dependency snapshot, run command, container image, and git ref. An agentlabx reproduce <artifact> CLI rebuilds any run bit-for-bit on a clean machine.

📦 Contracts over conventions

Every stage declares its I/O schema and completion criteria in Pydantic. Validation is deterministic; implementations are swappable behind the contract. typing.Any is banned project-wide — a stage that says nothing cannot be called "done".

👤 User sovereignty

Credentials live encrypted under the user's OS keyring, never in .env files or config repositories, never process-global. Multi-user from day one; solo use is the degenerate case under the same code path.

🔍 Observable over opaque

Every agent turn, tool call, stage transition, and PI decision is emitted as a structured event — consumable by the local UI, external test harnesses, and audit logs. No hidden work.

🧩 Standards-compliant tools

External tools (paper search, code execution, browser, filesystem, shared memory) are integrated as MCP servers. Bring your own or use the bundled set. Stages declare the capabilities they require; the orchestrator refuses to start if a capability is missing.

🏗️ Incremental build-up

The platform is assembled one stage at a time, each with its own spec, plan, and review. Plans declare the contract (what to build); subagent execution writes the code (how to build it). The two stay separate.

Roadmap

Three layers, built in strict order. Each stage delivers working, testable software; the next stage does not begin until its predecessor's gate passes.

Layer A — Backend framework

Complete the full framework before any research stage exists. All stage I/O contracts are defined upfront in Stage A4 so implementations slot into a known shape rather than reshaping the framework mid-stream.

A1 — Foundation infrastructure — auth, sessions, encrypted credentials, event bus, plugin registry, migrations, CLI, test shell
A2 — LLM provider module — LiteLLM router · traced wrapper (events + cost + budget cap) · mock LLM server · per-user encrypted key wiring · /api/llm/* provider/model endpoints
A3 — MCP host + bundled servers — arxiv-search · semantic-scholar · code-execution (sandboxed) · browser · filesystem · memory
A4 — Stage contract framework — Stage Protocol · Pydantic I/O contracts for every pipeline stage upfront · reproducibility-contract dataclass · plugin discovery
A5 — RAG component — Chroma · three-index design (project corpus / lab reference library / project artifact index) · citation verifier
A6 — Orchestrator + traffic engine + zones — forward routing · backtrack edge tracking · partial rollback · per-edge retry caps · checkpoint + resume · assist mode · graph extraction API
A7 — Current-run experiment notes — per-run server-local notes surface
A8 — Configurable agent layer — user-maintained agent definitions (name · system prompt · per-stage tool allow-list) · PI agent as the user's decision proxy invoked at final/clarification gates only

Layer B — Research stage implementations

Happy-path order, one at a time. Each stage ships its own internal flow, its own backtrack policy, and a stage-local real-LLM integration test. A stage is not "done" until its real-LLM test passes, and the next stage does not begin until then.

B1 — literature_review (Discovery) — curated references with grounded citations
B2 — plan_formulation (Discovery) — hypotheses · methodology · baselines · ablations
B3 — data_exploration (Implementation) — summary stats · dataset characterization
B4 — data_preparation (Implementation) — reproducible splits / transforms
B5 — experimentation (Implementation) — baseline + ablations · reproducibility contract · checkpoint+resume
B6 — interpretation (Synthesis) — grounded in actual metric values; no fabricated numbers
B7 — report_writing (Synthesis) — Markdown / LaTeX / PDF · citation verifier enforced
B8 — peer_review (Synthesis) — structured critique · backtrack signal can target any earlier stage

Layer C — Frontend, reproducibility, hardening, memory governance

Built only after Layers A + B are complete — each needs something real to exercise it.

C1 — Full frontend — main pipeline graph · per-stage inner views · backtrack animation · PI/User decision panel · artifact comparison view · cost/budget dashboard
C2 — Reproduce CLI + e2e harness — agentlabx reproduce <artifact> · semantic-fact assertion harness driving integration tests across the whole pipeline
C3 — Multi-user-on-one-install hardening — adversarial credential isolation · session audit · LAN-bind TLS audit
C4 — Shared experiment memory governance — memory MCP server governance layer · category taxonomy seeded from observed Layer B material · curator workflow · cross-install via remote memory server

Full details: SRS Part 4 — Build Roadmap.

Quick start

Five-minute walkthrough: docs/quickstart.md

# 1. Install dependencies (requires Python 3.12+, Node 20+, uv, npm)
uv sync --extra dev
(cd web && npm install)

# 2. Run the backend on loopback
uv run agentlabx serve --bind loopback --port 8765

# 3. In another terminal, run the Vite dev server
(cd web && npm run dev)
# → open http://127.0.0.1:5173

On a fresh install, the browser opens a "Create first identity" form — fill in your name, email, and passphrase to become the Owner. Alternatively, automate the first admin setup from the shell:

uv run agentlabx bootstrap-admin --display-name "Alice" --email alice@example.com
# → prompts for a passphrase (entered twice)

Either path is fine — they both hit the same backend code. The CLI variant is for headless / scripted installs (CI, container entrypoints); interactive operators can just open the browser.

On first start the server writes SQLite + event log under ~/.agentlabx/. Delete that directory to reset the install.

For LAN binding (lab deployment) you pass --bind lan --tls-cert CERT --tls-key KEY; TLS is enforced on non-loopback bind.

Architecture

agentlabx/                     Python backend (FastAPI + SQLAlchemy 2 async)
├── auth/                      Auther Protocol, Default/Token/OAuth, Identity, AuthError
├── security/                  argon2 passwords, keyring store, Fernet encrypt/decrypt
├── config/                    pydantic-settings AppSettings, BindMode, TLS gate
├── db/                        schema (6 tables), session factory, in-place migrations
├── events/                    in-process asyncio pub/sub EventBus + JsonlEventSink
├── plugins/                   PluginRegistry + importlib.metadata discovery
├── models/                    Pydantic request/response DTOs
├── server/                    FastAPI app factory, session middleware + Bearer auth,
│                              dependencies (current_identity, require_admin), routers
│                              (auth, settings, runs, health), rate limiter
└── cli/                       `agentlabx serve` + `bootstrap-admin` + `reset-passphrase`

web/                           React frontend (Vite + TypeScript strict + Tailwind + shadcn/ui)
├── src/api/                   fetch-based client with FastAPI detail-parsing
├── src/auth/                  AuthProvider + LoginPage (bootstrap-aware default mode)
├── src/components/            Layout (Claude-style sidebar), ui/ (shadcn primitives),
│                              confirm-dialog, password-input
└── src/pages/                 SettingsPage (credentials), ProfilePage (self-edit + tokens +
                               sessions), AdminPage (users), AdminActivityPage (audit),
                               RunsPage (placeholder — Layer B will populate)

docs/superpowers/              vision + SRS + per-stage implementation plans
tests/                         pytest suite (unit + integration)

Configuration: environment variables prefixed AGENTLABX_ override defaults. See agentlabx/config/settings.py.

Development

# Backend gates
uv sync --extra dev
uv run pytest -v
uv run ruff check agentlabx tests
uv run mypy agentlabx tests
uv run agentlabx serve --bind loopback

# Frontend gates
cd web
npm install
npm run lint                               # tsc --noEmit
npm run build                              # emits web/dist/ for production serve
npm run dev                                # Vite at :5173, proxies /api to :8765

Tech stack


Backend	Python 3.12 · FastAPI · uvicorn · SQLAlchemy 2 async + aiosqlite · Pydantic 2 · argon2-cffi · cryptography (Fernet) · keyring · itsdangerous · click · httpx
Frontend	React 19 · Vite 5 · TypeScript 5 strict · Tailwind 3 · shadcn/ui · @radix-ui (alert-dialog, dropdown-menu, label, slot) · TanStack Query 5 · React Router 6 · lucide-react · sonner
Testing	pytest 8 · pytest-asyncio · httpx · ruff (`ANN` incl. `ANN401`) · mypy strict (`disallow_any_explicit`, `disallow_any_generics`) · tsc --noEmit
Tooling	uv (Python deps/lockfile), npm (JS deps), Vite (bundler)

Contributing

This project follows a spec → plan → subagent-driven execution → review workflow. Authoritative docs:

Vision — docs/superpowers/specs/2026-04-15-agentlabx-vision.md — north star, non-negotiable principles.
SRS — docs/superpowers/specs/2026-04-15-agentlabx-srs.md — full system requirements, architecture, and the Layer A / B / C build roadmap.
Stage plans — docs/superpowers/plans/ — each stage gets its own contract-driven implementation plan.

New features land as a spec amendment (if the SRS diverges) + a plan doc + task-by-task subagent execution with tests and commits. Small fixes can skip the plan step.

License

Licensed under the Apache License, Version 2.0 — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 377 Commits
agentlabx		agentlabx
docs		docs
tests		tests
third_party		third_party
web		web
.env.example		.env.example
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AgentLabX

Open research-automation platform

What AgentLabX is

The research pipeline

Core principles

Roadmap

Layer A — Backend framework

Layer B — Research stage implementations

Layer C — Frontend, reproducibility, hardening, memory governance

Quick start

Architecture

Development

Tech stack

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AgentLabX

Open research-automation platform

What AgentLabX is

The research pipeline

Core principles

Roadmap

Layer A — Backend framework

Layer B — Research stage implementations

Layer C — Frontend, reproducibility, hardening, memory governance

Quick start

Architecture

Development

Tech stack

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages