An agentic code review bot that uses Claude AI to review GitHub pull requests and push events. It posts inline review comments on PRs, commit comments on pushes to develop, and sends Slack notifications.
Reviewotron includes a multi-agent security analysis pipeline that detects injection, XSS, command injection, authentication, authorization, and SSRF vulnerabilities. Security findings go through adversarial validation before being reported, keeping noise low.
- How It Works
- Setup
- Configuration
- Security Review Pipeline
- Slack Integration
- State and Persistence
- CLI Usage
- Cost Tracking
- Limitations
- Known Issues
- Troubleshooting
Reviewotron runs as an HTTP server that receives GitHub webhook events. It can review on PR open/update, on pushes to develop, or when someone posts a REVIEW comment on a PR. All triggers are off by default — see Defaults below.
For each enabled trigger, the bot:
- Receives the webhook at the
/githubendpoint - Validates the signature using the configured webhook secret (HMAC-SHA256)
- Fetches the repo config from
.reviewotron.jsonin the repo (via GitHub API), or uses defaults - Fetches the diff for the PR or push (for
REVIEWcomments, also fetches the full PR via the API to recoverhead.sha, sinceissue_commentwebhooks don't carry it) - Filters the diff — removes ignored paths, checks size limits
- Runs review plugins concurrently:
- General review — Claude analyzes the diff for bugs, style, logic, performance, etc.
- Security review — A multi-agent pipeline scans for vulnerabilities (see below)
- Posts results:
- PR events: a single GitHub PR review with inline comments
- Push events: commit comments for critical/warning findings + a Slack message
REVIEWcomments: same as PR events
GitHub Webhook (POST /github)
│
├─ Signature validation (HMAC-SHA256)
├─ Event parsing (pull_request, push, or issue_comment)
├─ Config fetch from .reviewotron.json
├─ Diff fetch + filtering
│
├─ General Review Plugin (Claude Sonnet)
│ └─ Structured output: summary + findings
│
├─ Security Review Plugin (multi-agent)
│ ├─ Triage Agent (Haiku) → route signals
│ ├─ Analysis Agents (Sonnet, parallel) → candidate findings
│ ├─ Validator Agent (Sonnet) → confirm/reject
│ └─ Memory Curator (Haiku, async) → update memory
│
├─ Merge + deduplicate findings
│
└─ Post results
├─ PR → GitHub PR review (inline comments)
└─ Push → commit comments + Slack notification
| Event | Trigger | Gated by | Output |
|---|---|---|---|
pull_request (opened, reopened, ready_for_review) |
PR opened, reopened, or marked ready | auto_review_pr_open |
GitHub PR review with inline comments |
pull_request (synchronize) |
New commits pushed to a PR | auto_review_pr_sync |
GitHub PR review with inline comments |
push (to refs/heads/develop) |
Code pushed to develop | review_pushes_to_develop |
Commit comments + Slack message |
issue_comment (created, on a PR, body equals REVIEW) |
Manual trigger via PR comment | auto_review_on_comment |
GitHub PR review with inline comments |
The REVIEW trigger is exact-match: the comment body must equal the literal string REVIEW after trimming whitespace. Anything else (including REVIEW please or quoted text) is ignored silently. The bot must have the pull_request GitHub App permission and the Issue comment webhook event subscribed.
Events are processed asynchronously — the webhook returns 200 accepted immediately, and the review runs in the background.
All four automatic-review triggers default to false. A repo without a .reviewotron.json (or one that doesn't set the relevant flags) receives no reviews. Opt in via .reviewotron.json:
| Flag | Effect when true |
|---|---|
auto_review_pr_open |
Review PRs on open / reopen / ready-for-review |
auto_review_pr_sync |
Review PRs when new commits land on them |
review_pushes_to_develop |
Review pushes to the develop branch |
auto_review_on_comment |
Review when someone posts a REVIEW comment on a PR |
Manual REVIEW comments bypass the dedup that protects the automatic flow from re-reviewing the same head SHA — by design, since the manual trigger means the user wants a fresh review.
- OCaml toolchain with opam
- An Anthropic API key
- A GitHub personal access token (or GitHub App installation) for each repo
- (Optional) A Slack bot token for push notifications
make build # Build the project
make test # Run tests
make fmt # Format code
make clean # Clean build artifactsCreate a secrets.json file (see secrets.json.example):
{
"repos": [
{
"url": "https://github.com/org/repo",
"gh_token": "ghp_xxxxxxxxxxxx",
"gh_hook_secret": "your-webhook-secret"
}
],
"anthropic_api_key": "sk-ant-xxxxxxxxxxxx",
"slack_access_token": "xoxb-xxxxxxxxxxxx"
}Fields:
| Field | Required | Description |
|---|---|---|
repos |
Yes | List of repositories to monitor |
repos[].url |
Yes | Full GitHub repository URL (e.g. https://github.com/org/repo) |
repos[].gh_token |
Yes* | GitHub personal access token with repo scope |
repos[].gh_hook_secret |
No | Webhook secret for HMAC signature validation |
repos[].auth |
Yes* | Alternative to gh_token — GitHub App installation auth (see below) |
anthropic_api_key |
Yes | Anthropic API key for Claude |
slack_access_token |
No | Slack bot token for posting messages |
*Either gh_token or auth must be set per repo. Using gh_token is the simpler option.
Instead of a personal access token, you can authenticate as a GitHub App installation:
{
"repos": [
{
"url": "https://github.com/org/repo",
"auth": [
"AppInstallation",
{
"installation_id": "12345678",
"client_id": "Iv1.xxxxxxxxxx",
"pem": "-----BEGIN RSA PRIVATE KEY-----\n...\n-----END RSA PRIVATE KEY-----"
}
],
"gh_hook_secret": "your-webhook-secret"
}
]
}App installation tokens are automatically refreshed and cached (55-minute TTL).
Configure a webhook in your GitHub repository settings:
| Setting | Value |
|---|---|
| Payload URL | https://your-server:1338/github |
| Content type | application/json |
| Secret | Same value as gh_hook_secret in secrets.json |
| Events | Select Pull requests and Pushes |
./reviewotron run --port 1338 --secrets secrets.json --state state.jsonVerify it's running:
curl http://localhost:1338/pingEach repo can have a .reviewotron.json file in its root. This is fetched from the repo via the GitHub Contents API on each event. If the file doesn't exist, defaults are used.
{
"max_diff_lines": 2000,
"max_files": 50,
"max_tokens_per_review": 100000,
"model": "claude-sonnet-4-6",
"ignored_paths": ["*.test.js", "vendor/"],
"ignored_authors": ["dependabot[bot]"],
"auto_review_pr_open": false,
"auto_review_pr_sync": false,
"review_pushes_to_develop": false,
"auto_review_on_comment": false,
"system_prompt_override": null,
"slack_channel": "#code-reviews",
"show_review_cost": false,
"review_plugins": {
"general": {
"enabled": true,
"system_prompt_override": null
},
"security": {
"enabled": false,
"vuln_classes": ["injection", "xss", "command_injection", "authn", "authz", "ssrf"],
"triage_model_tier": "fast",
"analysis_model_tier": "standard",
"validator_model_tier": "standard",
"confidence_threshold": "medium",
"memory_max_tokens": 5000
}
}
}| Field | Default | Description |
|---|---|---|
max_diff_lines |
2000 |
Maximum total diff lines to review. PRs exceeding this are skipped. |
max_files |
50 |
Maximum files (currently used for informational purposes). |
max_tokens_per_review |
100000 |
Token budget hint for the review agent. |
model |
claude-sonnet-4-6 |
Model ID for the general review agent. |
ignored_paths |
[] |
Glob patterns for files to exclude from review. Supports * and ** wildcards. |
ignored_authors |
[] |
GitHub usernames whose PRs/pushes should be skipped. |
auto_review_pr_open |
false |
Review PRs when they are opened, reopened, or marked ready. |
auto_review_pr_sync |
false |
Review PRs when new commits are pushed to them. |
review_pushes_to_develop |
false |
Review pushes to the develop branch. |
auto_review_on_comment |
false |
Review when someone posts a top-level PR comment whose body is exactly REVIEW (after trimming). Requires the GitHub App to subscribe to Issue comment events. |
system_prompt_override |
null |
Replace the default general review system prompt entirely. |
slack_channel |
null |
Slack channel for push review notifications. Requires slack_access_token in secrets. |
show_review_cost |
false |
Append a cost summary footer to PR reviews. |
review_plugins |
(see below) | Per-plugin configuration. |
| Field | Default | Description |
|---|---|---|
enabled |
true |
Enable/disable the general code review. |
system_prompt_override |
null |
Override the general review prompt (plugin-level). |
| Field | Default | Description |
|---|---|---|
enabled |
false |
Enable/disable security analysis. |
vuln_classes |
All 6 classes | Which vulnerability types to scan for. |
triage_model_tier |
"fast" |
Model tier for the triage agent. |
analysis_model_tier |
"standard" |
Model tier for per-class analysis agents. |
validator_model_tier |
"standard" |
Model tier for the adversarial validator. |
confidence_threshold |
"medium" |
Minimum triage confidence to trigger analysis. "high" = only high-confidence signals. "medium" = high + medium. "low" = all signals. Note: signals whose vuln class is explicitly listed in vuln_classes always pass through regardless of this threshold. |
memory_max_tokens |
5000 |
Target size limit for the repo's security memory file. |
| Tier | Model | Typical Use |
|---|---|---|
"fast" |
claude-haiku-4-5-20251001 |
Triage, memory curator |
"standard" |
claude-sonnet-4-6 |
Analysis agents, validator, general review |
"strong" |
claude-opus-4-6 |
Reserved for complex codebases |
| Value | Description |
|---|---|
"injection" |
SQL injection, NoSQL injection, query string construction |
"xss" |
Cross-site scripting (reflected, stored, DOM-based) |
"command_injection" |
OS command injection via exec/system/popen |
"authn" |
Authentication bypass, weak token validation, missing expiry |
"authz" |
Authorization flaws, IDOR, missing permission checks |
"ssrf" |
Server-side request forgery via user-controlled URLs |
Reviewotron skips events in these cases:
- Bot senders — any login ending in
[bot] - Ignored authors — usernames in the
ignored_authorslist - Non-reviewable actions — PR closed, edited, or other non-code-change actions
- Draft PRs — skipped until marked ready
- Already reviewed — same PR + head SHA (or same push after SHA) already processed
- Empty diff — all files filtered by
ignored_paths - Diff too large — exceeds
max_diff_lines - Non-develop pushes — only
refs/heads/developis reviewed
When the security plugin is enabled, every diff goes through a multi-agent pipeline:
Scans the diff for security-relevant patterns and classifies them by vulnerability type. This is intentionally biased toward over-flagging — it's cheap to run an analysis agent that finds nothing, costly to miss a real issue.
The triage agent outputs signals with confidence levels (high, medium, low). The confidence_threshold config controls which signals proceed to analysis.
For each flagged vulnerability class, a specialized agent runs deep analysis:
- Source identification — Where does user-controlled input enter?
- Sink identification — Where does data reach a dangerous operation?
- Data flow tracing — Can the source reach the sink? Traces through variables, function calls, returns.
- Sanitization evaluation — Is there adequate, context-correct sanitization on the path?
Analysis agents can fetch additional files from the repo via the GitHub Contents API when they need to trace a data flow beyond the diff.
All candidate findings from all analysis agents pass through a single validator agent. It acts as an adversarial false-positive filter, checking:
- The claimed source actually accepts external input
- The claimed sink actually performs the dangerous operation
- Every step in the flow path is backed by evidence (file + line)
- The sanitization assessment is correct
Findings that fail validation are dropped. This is by design — a noisy security reviewer that cries wolf loses developer trust. Dropped findings are logged for offline prompt tuning.
After the review is posted, a curator agent runs asynchronously to update the repo's security memory with learnings from the review. This is fire-and-forget — it doesn't block the review.
| Analysis Confidence | Post-Validation Severity |
|---|---|
| High + Confirmed | Critical |
| Medium + Confirmed | Warning |
| Low + Confirmed | Warning |
Push reviews (to develop) optionally send a Slack notification. This requires:
- A
slack_access_tokeninsecrets.json— a Slack bot token (xoxb-...) withchat:writepermission - A
slack_channelset in the repo's.reviewotron.json
The message includes:
- Pusher name and commit count
- Link to the compare view on GitHub
- Review summary text
- Finding counts (critical, warnings, suggestions)
- Color-coded: red if any critical findings, green otherwise
If the security plugin encountered an error, a note is appended to the Slack message.
If slack_access_token is not configured, Slack posting is silently skipped.
The --state flag enables persistent state tracking. The state file (JSON) records:
- PR reviews: repo URL, PR number, head SHA, timestamp, review costs
- Push reviews: repo URL, after SHA
This prevents duplicate reviews — if the same PR at the same commit SHA is already recorded, the review is skipped. State is trimmed to the 500 most recent records per repo.
Without --state, state is in-memory only and lost on restart. This means reviews may be duplicated after a server restart.
The security pipeline maintains per-repo memory files at memory/{repo-slug}.md. These are plain-text markdown files (target ~5000 tokens) that accumulate knowledge about the repo:
- Architecture notes (frameworks, DB access patterns, auth middleware)
- Known safe patterns (parameterized queries, auto-escaping templates)
- Known risk areas (shell command construction, raw HTML rendering)
- Suppressions (accepted risks with context)
Memory is injected into every security agent's prompt, reducing redundant file fetching and pattern re-discovery across reviews.
Updates go through a queue file (memory/{repo-slug}.queue) for distributed safety — multiple reviewotron instances can append to the queue, and the curator processes it serially.
When an agent's structured output can't be parsed, a debug dump is saved to debug/{repo-slug}/{sha-prefix}/. These contain the raw agent output for diagnosing prompt or parsing issues.
reviewotron run [OPTIONS]
| Option | Default | Description |
|---|---|---|
-p, --port |
1338 |
HTTP server port |
--secrets |
secrets.json |
Path to secrets file |
--config-filename |
.reviewotron.json |
Config filename to look for in repos |
--state |
(none — in-memory) | Path to state file for persistence |
--logfile |
(stderr) | Log file path |
--loglevel |
(default) | Log level: debug, info, warn, error |
reviewotron check --event-type pull_request --payload payload.json [OPTIONS]
Parses and displays a GitHub webhook payload without starting the server or performing any review. Useful for verifying payload parsing.
| Option | Required | Description |
|---|---|---|
--event-type |
Yes | GitHub event type (pull_request or push) |
--payload |
Yes | Path to JSON payload file |
--secrets |
No | Path to secrets file (defaults to secrets.json; must exist for initialization) |
| Path | Description |
|---|---|
/ping |
Health check — returns uptime |
/github |
GitHub webhook receiver |
Every agent call tracks token usage and estimates cost:
- Per agent: input tokens, output tokens, cache read tokens, cache creation tokens, model ID, number of tool-use turns, files fetched, estimated USD cost
- Per plugin: aggregated agent costs (general, security)
- Per review: total across all plugins
Costs are:
- Logged at
infolevel after each review - Stored in
state.jsonalongside the review record (when state persistence is enabled) - Optionally shown in the PR review footer (when
show_review_cost: true)
Cost footer example:
Review cost: 5 agents (general: 1 agent, security: 4 agents), ~$0.42
Costs are estimated using a built-in pricing table that includes prompt caching rates:
| Model Family | Input | Output | Cache Write (5m) | Cache Read |
|---|---|---|---|---|
| Claude Opus 4.x | $5.00/MTok | $25.00/MTok | $6.25/MTok | $0.50/MTok |
| Claude Sonnet 4.x | $3.00/MTok | $15.00/MTok | $3.75/MTok | $0.30/MTok |
| Claude Haiku 4.5 | $1.00/MTok | $5.00/MTok | $1.25/MTok | $0.10/MTok |
Cache write tokens are charged at 1.25x the base input price (5-minute TTL). Cache read tokens are charged at 0.1x the base input price. Cache token counts are extracted from the Anthropic API response and tracked per-agent.
The pricing table is a single record in the codebase (lib/cost_tracking.ml) — update it when prices change.
PRs with more than max_diff_lines (default 2000) total diff lines are skipped entirely. There is no partial review — it's all or nothing. For large PRs, consider breaking them into smaller ones.
Only pushes to refs/heads/develop are reviewed. Other branches, including main/master, are not reviewed on push. PR reviews cover all branches.
- The general review plugin fetches up to 5 key files for additional context (added or modified files only)
- Security analysis agents can fetch any file via
get_file_content, bounded by the agent'smax_stepslimit - All file fetches use the PR head SHA as the git ref, so agents see the PR branch state (not the default branch)
The security pipeline performs static analysis on the diff and referenced files. It cannot:
- Execute code or run tests
- Detect runtime-only vulnerabilities
- Analyze compiled/minified code meaningfully
- Check infrastructure configuration (Terraform, Docker, etc.)
- 6 vulnerability classes are supported. Other classes (e.g., cryptographic weaknesses, deserialization, path traversal) are not covered.
- The triage agent may miss security signals in unusual code patterns. Bumping
triage_model_tierto"standard"(Sonnet) can improve recall at higher cost. - AuthN/AuthZ/SSRF analysis from diff context alone is inherently limited. These classes produce the most false negatives.
If no gh_hook_secret is configured for a repo, webhook signature validation is skipped — the event is accepted without verification. While the review will fail at the GitHub API step if no auth token is configured, it's best practice to always set a webhook secret.
Duplicate review prevention relies on the state file. Without --state, or after a server restart with in-memory-only state, the same PR/push may be reviewed again.
Multiple reviews can run concurrently (events are processed via Lwt.async). The security memory queue handles concurrent appends safely, but there's no global rate limiting on Anthropic API calls.
- Check the webhook delivery log in GitHub (Settings > Webhooks > Recent Deliveries)
- Verify the server is running:
curl http://your-server:1338/ping - Check the server logs for skip reasons:
"bot sender"— the event was from a bot account"ignored author"— the author is inignored_authors"action ... not reviewable"— the PR action doesn't trigger reviews"draft PR"— mark the PR as ready for review"already reviewed at ..."— duplicate detection fired
- Check that the repo URL in
secrets.jsonmatches exactly (includinghttps://github.com/...)
"no auth configured for repo ..."— the repo URL in the webhook doesn't match any entry insecrets.json"failed to fetch config"— GitHub API error fetching.reviewotron.json(check token permissions)"triage agent failed"/"analysis agent failed"— Claude API error (checkanthropic_api_key, rate limits)"failed to post review"— GitHub API error posting the review (check token scopes: needsrepoorpull_request:write)
- Check that
review_plugins.security.enabledistruein.reviewotron.json(it isfalseby default) - Check the
confidence_threshold—"high"is very selective. Try"medium"or"low" - Check the logs for
"triage: no actionable signals"(the diff may not contain security-relevant code) - Check for
"validator rejected"messages — the finding was detected but rejected as a false positive - Bump
analysis_model_tierto"strong"for complex codebases
When an agent produces output that can't be parsed as structured JSON, a debug dump is saved to debug/{repo-slug}/{sha-prefix}/. Look here when you see "failed to parse ... output" in the logs.
- No rate limiting for Anthropic API calls. Concurrent reviews (e.g., multiple PRs opened at once) will all call the Anthropic API simultaneously. There is no built-in throttling or queue. The SDK handles 429 errors with automatic retry and exponential backoff, so transient rate limits self-heal. At typical usage (a handful of monitored repos), this is unlikely to be an issue.
src/
reviewotron.ml CLI entrypoint (cmdliner: run + check commands)
request_handler.ml HTTP server, webhook routing, signature validation
lib/
api.ml Module type signatures (Github, Agent_runner, Slack)
api_remote.ml Production implementations (real HTTP calls)
api_local.ml Mock implementations (for testing)
context.ml Application context: secrets, config cache, state
config_types.ml All configuration types ([@@deriving json])
github_types.ml GitHub API request/response types
slack_types.ml Slack API types
github.ml Event parsing, signature validation
github_auth.ml GitHub token/JWT auth (PAT + App Installation)
reviewer.ml Plugin orchestrator (Make functor)
review_plugin.ml Plugin interface type
general_review_plugin.ml General code review (single Claude agent)
security_review_plugin.ml Multi-agent security pipeline
agent_runner.ml Generic agent execution via ocaml-ai-sdk
triage_agent.ml Triage agent config + prompt
analysis_agent.ml Per-vuln-class analysis agent framework
validator_agent.ml Adversarial validation agent
memory_curator_agent.ml Memory update curator agent
security_types.ml All security pipeline types
security_tools.ml get_file_content tool for agents
security_memory.ml Memory file + queue I/O
review_types.ml Finding, severity, review output types
review_format.ml Finding → PR comment / Slack formatting
review_prompt.ml General review prompt construction
cost_tracking.ml Per-agent + per-review cost estimation
diff_parser.ml Unified diff parser + path filtering
state.ml / state_types.ml Persistent state (review dedup)
http_util.ml HTTP request helper
test/
test.ml Main test suite (golden-file tests)
test_diff_parser.ml Diff parser unit tests
test_security_corpus.ml Security corpus test runner (calls Claude — on-demand)
test_helpers.ml Test context setup
mock_api_responses/ Golden-file fixtures
mock_payloads/ Sample webhook payloads
security_corpus/ Synthetic vulnerable/safe diffs per vuln class
The codebase uses OCaml functors for testability — Reviewer.Make takes Github, Agent_runner, and Slack module implementations, so tests can inject mock versions (Api_local) without any HTTP calls.