Know your instrument.
STAN is an open-source proteomics QC tool for Bruker timsTOF and Thermo Orbitrap mass spectrometers. It watches your raw data directories for new acquisitions, auto-detects DIA or DDA mode, runs standardized search jobs (DIA-NN for DIA, Sage for DDA) directly on your instrument workstation, computes instrument health metrics, gates your sample queue automatically on QC failure, tracks longitudinal performance in a local database, serves a dashboard, and optionally benchmarks your instrument against the global proteomics community through a crowdsourced HeLa digest dataset.
No HPC cluster required. STAN runs DIA-NN and Sage locally on the same machine as your instrument. For labs with SLURM cluster access, remote HPC execution is available as an option.
Originally built at the UC Davis Proteomics Core by Brett Stanley Phinney. Extended for Bruker timsTOF ion mobility workflows by 🧙♂️ The Peptide Wizard.
- Multi-instrument monitoring -- Bruker timsTOF and Thermo Orbitrap in a single dashboard
- DIA and DDA mode intelligence -- auto-detects acquisition mode and routes to the right search engine with the right metrics
- Run and Done gating -- automatically pauses your sample queue (HOLD flag) when a QC run fails thresholds
- Instrument Performance Score (IPS) -- a single 0-100 composite number for LC health, updated every run
- Column health tracking -- longitudinal TIC trend analysis detects column aging before it affects your data
- Precursor-first metrics -- benchmarks on precursor count (DIA) and PSM count (DDA), not protein count, because protein count is confounded by FASTA choice and inference settings
- Community HeLa benchmark -- compare your instrument against 967+ runs from labs worldwide at community.stan-proteomics.org (CC BY 4.0)
- Zero-config raw file intelligence -- STAN auto-extracts instrument model, serial number, LC system, gradient length, DIA window size, acquisition date, and DIA/DDA mode directly from
.rawand.dfiles. The only thing you tell STAN is your column and HeLa amount. - Anonymous lab identity -- fun pseudonyms ("Clogged PeakTail", "Caffeinated Quadrupole") with email verification so you can track your own data on the community site without revealing your lab
- Gas-gauge dashboard -- at-a-glance instrument health: 5 recent runs × 3 metrics, green/amber/red zones vs your instrument's own history
- Instrument health fingerprint -- dual-mode DDA+DIA radar chart for rapid visual diagnosis
- Plain-English failure diagnosis -- templated alerts explain what failed and what to check, no guesswork
- Privacy by design -- raw files are never uploaded; only aggregate QC metrics leave your lab
- Ion Mobility Viewer -- Full 4D feature explorer: RT×1/K₀ heatmap, interactive 3D RT×m/z×1/K₀ scatter, charge distribution, FWHM histogram, and intensity histogram. Charge state buttons (+?/+1–+6) always visible for instant filtering. Includes unassigned (z=0) features that 4DFF could not deconvolute — shown in gray across all views (3D scatter, landscape, waterfall, both ion clouds). z=+1 singly-charged ions fully shown. Works from DIA-NN
report.parquetwithout 4DFF;.featuresused when available for higher density (up to 15,000 points). - Enzyme Efficiency tab -- Per-run missed cleavage distribution (0/1/2/3+), variable modification frequencies (oxidation, deamidation, acetylation), and enzyme health summary cards — all parsed from
Modified.Sequenceinreport.parquetat 1% FDR - Instrument Health tab -- Local Levey-Jennings control charts (±1σ/2σ/3σ) for 6 key metrics, radar/fingerprint chart showing percentile rank within your own history, LC system breakdown table, and expandable "Understanding the Metrics" educational accordion. No cloud required — all analytics use your local SQLite DB
- Immunopeptidomics tab -- Peptide length histogram with MHC Class I (8–14 aa) and Class II (13–25 aa) shading, charge distribution, ion cloud (m/z × 1/K₀), modification frequency bars, and filterable peptide table — designed for non-specific DIA immunopeptidomics workflows
- Spectrum Viewer -- Theoretical b/y fragment ion series with UniMod modification support; experimental overlay showing DIA-NN
Best.Fr.Mz(★), RT window, 1/K₀, and precursor intensity; peptide search acrossreport.parquet - RawMeat-style identification-free QC -- instant spray stability, TIC by MS level, accumulation time profiles, and pressure traces directly from
analysis.tdf— no DIA-NN or Sage search required (click theRAWbadge on any.drun) - Carafe2 experiment-specific libraries -- after each DIA run, STAN automatically trains a timsTOF-specific spectral library using Carafe2 in the background and injects it into the next DIA-NN search. No manual steps required. (green
LIBbadge when a library is ready) - timsplot integration -- launch the timsplot Shiny visualization app directly from the TIC viewer or via
stan timsplot - MsBackendTimsTof integration --
stan msbackendgenerates pre-filled R scripts for Bioconductor-based timsTOF data access
| Vendor | Instruments | Raw Format | Acquisition Modes |
|---|---|---|---|
| Bruker | timsTOF Ultra 2, Ultra, Pro 2, SCP | .d directory |
diaPASEF, ddaPASEF |
| Thermo | Astral, Exploris 480, Exploris 240 | .raw file |
DIA, DDA |
Download install-stan.bat (right-click → Save As) and double-click it. This single script handles everything:
- Installs Python 3.10+ if not present
- Clones the STAN repository and installs it via pip
- Auto-installs DIA-NN from GitHub releases (.msi for 2.x, with admin elevation if needed)
- Auto-installs Sage from GitHub releases
- Handles SSL/proxy issues automatically (common on UC Davis and other institutional networks)
- Uses
--no-cache-dirto ensure fresh code on every install
To update an existing install, use update-stan.bat -- it downloads the latest code from GitHub and reinstalls. Both .bat files self-update by downloading their latest version from GitHub on each run.
Note: The old
install_stan.bat(underscore) was removed to avoid confusion. Onlyinstall-stan.bat(hyphen) andupdate-stan.batexist now.
DIA-NN 2.x is preferred over 1.x when both are installed. If the installer cannot find or install a search engine, stan setup and stan baseline will prompt you for a custom executable path.
git clone https://github.com/bsphinney/stan.git
cd stan
pip install -e ".[dev]"You will also need DIA-NN and Sage installed and on your PATH. See the search engine sections below for download links.
pip install stan-proteomics # not yet publishedstan initCreates ~/.stan/ and copies default configuration templates into it.
Run stan setup for an interactive 6-question wizard that picks your instrument, directories, LC method, FASTA, and error telemetry preferences -- no YAML editing required. If your watch directory already has existing raw files, the wizard offers to run stan baseline at the end to process them retroactively. Or create the config files manually:
instruments.yml-- instrument watch directories and settingsthresholds.yml-- QC pass/warn/fail thresholds per instrument modelcommunity.yml-- HuggingFace token and community benchmark preferences
stan watchStarts the watcher daemon. It monitors directories configured in instruments.yml, detects new raw files, determines acquisition mode, and runs DIA-NN or Sage locally on your machine. (Requires a working instruments.yml in ~/.stan/ -- see Configuration. DIA-NN and Sage must be installed and on your PATH.)
stan dashboardServes the FastAPI backend at http://localhost:8421. The API is fully functional -- browse /docs for Swagger UI. The dashboard includes a Config tab for managing instruments. (The full React frontend is planned; the current UI is a basic HTML page with config management.)
stan baselineProcesses existing HeLa QC files retroactively -- ideal for building historical QC data from a directory of past runs. Features:
- Recursive discovery of
.dand.rawfiles in subdirectories - Auto-detects gradient length from raw files (Thermo via TRFP metadata, Bruker via
Frames.Timeinanalysis.tdf) - Auto-detects LC system from raw file metadata (U3000, Vanquish Neo, Evosep, etc.)
- Auto-downloads the community FASTA from the HF Dataset if not cached locally
- Pre-flight tests DIA-NN and Sage before processing (runs a quick test search to verify they work)
- If a search engine is not found or fails pre-flight, prompts for a custom executable path
- Prefers DIA-NN 2.x over 1.x when both are installed
- Resume support (tracks progress in
~/.stan/baseline_progress.json) - Duplicate detection (skips files already in the database)
- Scheduling options: run now, tonight (8 PM), or weekend (Saturday 8 AM)
stan status # show configuration and database summary
stan column-health # assess LC column condition from longitudinal TIC trends
stan version # print STAN versionstan timsplot # launch timsplot Shiny visualization app (port 8422)
stan carafe # manage Carafe2 spectral library building
stan msbackend # generate R script for MsBackendTimsTof data accessSee Companion Tools for details on each.
STAN provides three complementary views of timsTOF ion mobility data directly in the dashboard. No Bruker 4D Feature Finder is required — all panels populate automatically from the DIA-NN report.parquet that STAN already generates.
The Ion Mobility tab (in the main nav) shows all panels for any timsTOF .d run in the database.
| Priority | Source | Covers |
|---|---|---|
| 1st | Bruker 4DFF .features SQLite |
All detected features (identified + unidentified) |
| 2nd | DIA-NN report.parquet |
Identified precursors at 1% FDR — works for every run automatically |
A "4DFF ✓" (blue) badge shows when .features is the source; "DIA-NN ✓" (lighter blue) when falling back to report.parquet.
| Panel | What it shows |
|---|---|
| RT × 1/K₀ heatmap | Log₁₀-intensity density map rendered on HTML5 Canvas with labelled RT and 1/K₀ axis ticks |
| 3D Feature Map (interactive Plotly) | RT × m/z × 1/K₀ scatter, coloured by charge state, sized by intensity |
| Filter panel | Charge state toggles + m/z range + RT range — instant client-side filtering of the 3D scatter |
| Charge distribution | Bar chart of precursor charge states |
| RT FWHM histogram | Chromatographic peak width distribution (seconds) |
| Intensity histogram | Log₁₀ precursor intensity distribution |
| diaPASEF window layout | m/z × 1/K₀ isolation grid read directly from analysis.tdf |
The heatmap canvas (760 × 320 px internal resolution) uses a perceptually uniform blue-to-gold colour scale with proper padded axis margins. All data is streamed as JSON — the browser never reads raw files directly.
The Enzyme tab shows per-run digest quality metrics parsed from DIA-NN Modified.Sequence at 1% FDR.
| Panel | What it shows |
|---|---|
| Missed Cleavages | Horizontal bar chart for 0/1/2/3+ missed cleavages with % labels. Colour-coded green→red. |
| Modifications | Top variable modifications sorted by frequency (oxidation, deamidation, acetylation, etc.). Fixed Carbamidomethyl excluded. |
| Enzyme Health Summary | Traffic-light cards: specificity (≥70% MC=0), ≥2 missed cleavage rate (<10%), oxidation rate (<5%) |
Clicking the burnt-orange RAW badge on any .d run opens the RawMeat Viewer, which reads analysis.tdf directly. No DIA-NN or Sage search is required.
Inspired by the discontinued RawMeat tool (Vast Scientific), reimplemented from scratch for timsTOF using the open TDF SQLite schema.
| Metric | Source column | Notes |
|---|---|---|
| TIC trace (MS1) | Frames.SummedIntensities where MsMsType = 0 |
Blue trace |
| TIC trace (MS2) | Frames.SummedIntensities where MsMsType ≠ 0 |
Orange trace |
| Spray stability score (0-100) | Frames.MaxIntensity (MS1 frames) |
100 − (dropouts × 10); a dropout is a frame below 25% of its local 11-frame median |
| Spray CV % | Frames.MaxIntensity (MS1 non-zero) |
Global CV of peak intensities across the run |
| Dropout list | Rolling-median detection | Retention times (min) where spray instability is detected |
| Dynamic range (log₁₀) | max(MS1 MaxIntensity) / median(MS1 MaxIntensity) |
Typical values: 1–3 log₁₀ |
| Accumulation time profile | Frames.AccumulationTime, Frames.RampTime |
Trap fill time trace; shows TIMS duty cycle |
| Pressure trace | Frames.Pressure |
Available on some instrument configurations |
| Frame summary | Frames counts by MsMsType |
Total frames, MS1/MS2 split, RT range |
| Instrument metadata | GlobalMetadata |
Instrument name, serial number, software version, acquisition date, operator, method |
Why MaxIntensity instead of SummedIntensities for stability? SummedIntensities is the sum of all ion signals in a frame and is dominated by the LC gradient plateau — its median is close to the peak, which compresses dynamic range and makes dropouts hard to detect. MaxIntensity is the single highest-intensity scan in the frame and is more sensitive to spray events.
MsMsType values in Bruker TDF:
0→ MS1 (survey / PASEF precursor frames)2→ DDA MS2 (traditional)8→ ddaPASEF (data-dependent PASEF MS2)9→ diaPASEF (data-independent PASEF / windows)
These tools are not bundled with STAN but integrate seamlessly through CLI commands. They each serve different analysis needs beyond STAN's core QC scope.
timsplot is a Python Shiny web application for interactive visualization of timsTOF data including 2D ion mobility maps, TIC overlays, and spectrum viewers.
stan timsplotOn first run, STAN clones timsplot into ~/.stan/tools/timsplot/ and launches it as a local Shiny app on port 8422. Subsequent runs reuse the cached clone. A "timsplot ↗" button appears in the TIC viewer modal for runs with .d files, opening timsplot in a new browser tab.
MsBackendTimsTof is an R/Bioconductor backend for the Spectra framework that provides direct read access to Bruker .d files, including frame-level MS1 and MS2 spectra.
stan msbackendGenerates a pre-filled R script for the run you select. The script installs BiocManager and the required Bioconductor packages on first use and opens the .d directory directly via the Bruker proprietary SDK (bundled with the Bruker software installation).
Note: MsBackendTimsTof requires the Bruker TDF SDK (
.dll/.so), which is included with Bruker DataAnalysis or freely downloadable from Bruker's developer portal. It is not redistributable and is not bundled with STAN.
Carafe is a tool for building experiment-specific spectral libraries for timsTOF DIA data. It uses deep learning to predict peptide fragmentation patterns and ion mobility values calibrated to your specific instrument conditions.
STAN makes Carafe completely seamless:
- After each DIA-NN search completes, STAN automatically runs Carafe in a background daemon thread using the
.dfile and FASTA. - The resulting library is saved to
~/.stan/libraries/<instrument>/carafe_latest.tsv. - On the next DIA-NN search for the same instrument, STAN automatically injects the library via
--lib. - The run record shows a green
LIBbadge when a Carafe library was used.
stan carafe # list available Carafe libraries and runs
stan carafe --run <id> # manually run Carafe on a specific runSetup in instruments.yml:
- name: "timsTOF Ultra"
vendor: "bruker"
carafe_enabled: true
carafe_fasta: "/path/to/human.fasta"
carafe_java: "java" # path to Java ≥11 runtime (default: "java")Install Carafe:
Download the JAR from github.com/Noble-Lab/Carafe/releases and place it in ~/.stan/tools/carafe/carafe-<version>.jar. STAN auto-detects the latest version.
Raw data directory (watched by watcher daemon)
|
| file stable for stable_secs
v
detector.py -- reads .d/analysis.tdf or .raw metadata
|
+-- DIA --> local DIA-NN --> report.parquet
+-- DDA --> local Sage --> results.sage.parquet
|
extractor.py + chromatography.py
|
evaluator.py --> PASS / WARN / FAIL
| |
SQLite DB queue.py (HOLD flag)
|
dashboard (FastAPI, port 8421)
| (React frontend planned)
community/submit.py --> HF Dataset
Data flow: The watcher daemon detects new raw files and checks for file stability (size stops changing). Once stable, the detector reads instrument metadata to determine DIA or DDA mode. STAN runs DIA-NN (for DIA) or Sage (for DDA) locally on your instrument workstation as a subprocess with standardized parameters. After the search completes, STAN extracts QC metrics from the results, evaluates them against per-instrument thresholds, writes a HOLD flag if the run fails, stores everything in SQLite for longitudinal tracking, and optionally submits to the community benchmark.
Execution modes:
- Local (default) -- DIA-NN and Sage run as subprocesses on the same machine. Install them once, add to PATH, and STAN handles the rest. A typical QC HeLa run searches in 5-15 minutes on a modern workstation.
- SLURM (optional) -- For labs with HPC cluster access, set
execution_mode: "slurm"ininstruments.yml. STAN submits batch jobs via SSH/paramiko and polls for completion.
STAN uses a deliberate metric hierarchy. This is a core design decision that differentiates STAN from other QC tools:
Fragment XICs / precursor <-- purest instrument signal
Precursor count @ 1% FDR <-- PRIMARY metric for DIA (Track B)
PSM count @ 1% FDR <-- PRIMARY metric for DDA (Track A)
Peptide count <-- secondary for both modes
Protein count <-- contextual only, never used for ranking
Protein count is intentionally excluded from primary benchmarking. It is heavily confounded by FASTA database choice, protein inference algorithm, and FDR propagation settings. Precursor and PSM counts with a standardized search provide a much cleaner signal of instrument performance.
STAN powers an open, crowdsourced HeLa digest benchmark hosted on HuggingFace. Labs worldwide submit aggregate QC metrics (never raw files) from their HeLa standard runs, enabling cross-lab instrument performance comparisons.
Community submission is entirely opt-in. By default, community_submit is false and nothing leaves your machine. STAN works fully standalone for local QC monitoring, gating, and longitudinal tracking without ever contacting an external service. Set community_submit: true per instrument only if you want to participate in the benchmark.
Browse the community dashboard: community.stan-proteomics.org (also at huggingface.co/spaces/brettsp/stan)
The community site is live with 967 runs across Fusion Lumos, timsTOF HT, and Exploris 480 instruments.
All community benchmark submissions use a frozen, standardized search with pinned FASTA, spectral libraries, and search parameters hosted in the HF Dataset repository. This is what makes cross-lab comparisons valid -- every lab searches the same library with the same settings, so differences in precursor counts reflect actual instrument performance, not search configuration.
| Track | Mode | Search Engine | Primary Metric | Secondary Metrics |
|---|---|---|---|---|
| Track A | DDA | Sage | PSM count @ 1% FDR | Peptide count, mass accuracy, MS2 scan rate |
| Track B | DIA | DIA-NN | Precursor count @ 1% FDR | Peptide count, median CV, IPS |
| Track C | Both | Both | Instrument fingerprint | Radar chart (6 axes), peptide recovery ratio |
Track C unlocks when a lab submits both a DDA and a DIA run from the same instrument within 24 hours. The resulting six-axis radar chart provides a comprehensive instrument health fingerprint covering mass accuracy, duty cycle, spectral quality, precursor depth, quantitative reproducibility, and fragment sensitivity.
Submissions are compared only within their cohort, defined by three dimensions: instrument family, throughput (SPD), and injection amount. This ensures a 50 ng run on a timsTOF Ultra at 60 SPD is compared against other 50 ng timsTOF Ultra 60 SPD runs, not against a 500 ng Astral at 200 SPD.
Throughput buckets (SPD -- samples per day):
SPD is the primary throughput unit. Labs set their Evosep, Vanquish Neo, or equivalent method by SPD in instruments.yml. Gradient length in minutes is accepted as a fallback for custom LC methods.
| Bucket | SPD Range | Evosep Method | Traditional Equivalent |
|---|---|---|---|
200+spd |
200 or more | 500/300/200 SPD | ~2-5 min gradient |
100spd |
80-199 | 100 SPD | ~11 min gradient |
60spd |
40-79 | 60 SPD (most popular), Whisper 40 | ~21-31 min gradient |
30spd |
25-39 | 30 SPD | ~44 min gradient |
15spd |
10-24 | Extended | ~60-88 min gradient |
deep |
under 10 | -- | >2h gradient |
Amount buckets (injection amount in ng):
| Bucket | Range | Typical Use |
|---|---|---|
| ultra-low | 25 ng or less | Single-cell, very low input |
| low | 26-75 ng | Standard QC (50 ng default) |
| mid | 76-150 ng | Moderate load |
| standard | 151-300 ng | Traditional 200-250 ng QC |
| high | 301-600 ng | High-load methods |
| very-high | over 600 ng | Specialized applications |
The default injection amount is 50 ng and is configurable per instrument in instruments.yml via the hela_amount_ng field.
A minimum of 5 submissions per cohort is required before the leaderboard activates.
DIA Score (Track B):
DIA_Score = 40 x percentile_rank(n_precursors)
+ 25 x percentile_rank(n_peptides)
+ 20 x (100 - percentile_rank(median_cv_precursor))
+ 15 x percentile_rank(ips_score)
DDA Score (Track A):
DDA_Score = 35 x percentile_rank(n_psms)
+ 25 x percentile_rank(n_peptides_dda)
+ 20 x percentile_rank(pct_delta_mass_lt5ppm)
+ 20 x percentile_rank(ms2_scan_rate)
Scores are computed nightly within each cohort by a GitHub Actions workflow. A score of 75 means your instrument outperformed 75% of comparable submissions. (Nightly consolidation is implemented but will not run until the HF Dataset has live submissions.)
To encourage participation and make QC a point of pride (or healthy shame), STAN will recognize top and bottom performers each year:
| Award | Criteria | Prize |
|---|---|---|
| Golden Spray Tip | Highest median community score across all cohorts, minimum 50 submissions | Trophy + bragging rights |
| Most Consistent | Lowest CV of community scores over the year (the lab that never has a bad day) | Trophy |
| Most Improved | Largest year-over-year score increase | Trophy |
| The Clogged Emitter | Lowest median community score, minimum 50 submissions | Trophy of Shame (opt-in -- you have to claim it) |
Awards are computed from the community benchmark dataset and announced annually. Labs must have community_submit: true and at least 50 submissions in the calendar year to qualify. The Clogged Emitter is opt-in -- your lab is never publicly shamed without consent. All awards are meant in good fun and to motivate better instrument maintenance across the field.
- Raw files are never uploaded -- only aggregate QC metrics
- Patient or sample metadata is never collected
- Serial numbers are stored server-side but never exposed in API responses or downloads
- Anonymous submissions are supported (
display_namecan be left blank) - Submissions can be deleted by filing a GitHub issue with the
submission_id - Community dataset licensed under CC BY 4.0
All configuration files live in ~/.stan/. They are YAML files that can be edited with any text editor. The watcher daemon hot-reloads instruments.yml every 30 seconds without requiring a restart. The dashboard Config tab provides a GUI for viewing and removing instruments (the Remove button deletes duplicate entries).
Run stan init to create default configuration templates, or stan setup for the interactive wizard. Manual editing examples below.
Defines which instruments to monitor, where their raw files land, and instrument-specific settings.
# STAN instrument watcher configuration
# Hot-reloaded every 30 seconds -- no restart needed after edits
instruments:
- name: "timsTOF Ultra"
vendor: "bruker"
model: "timsTOF Ultra"
watch_dir: "D:/Data/raw" # where .d directories appear
output_dir: "D:/Data/stan_out" # STAN writes results + HOLD flags here
extensions: [".d"]
stable_secs: 60 # seconds of no size change before processing
enabled: true
hela_amount_ng: 50 # injection amount in ng (default: 50)
spd: 30 # samples per day (Evosep 30 SPD)
community_submit: false # set true to share QC metrics with community benchmark
- name: "Astral"
vendor: "thermo"
model: "Astral"
watch_dir: "D:/Data/raw"
output_dir: "D:/Data/stan_out"
extensions: [".raw"]
stable_secs: 30
enabled: true
hela_amount_ng: 50
spd: 60 # Evosep 60 SPD
community_submit: false # set true to opt in
# ── Optional: SLURM HPC execution ──────────────────────────────────
# Uncomment to run searches on a remote cluster instead of locally.
# Most labs do NOT need this — local execution is the default.
#
# hive:
# host: "hive.ucdavis.edu"
# user: "your_username"
#
# Then add to each instrument:
# execution_mode: "slurm" # default is "local"
# hive_partition: "high"
# hive_account: "your-account-grp"Vendor-specific file stability detection:
- Bruker
.d: The.ddirectory size is checked every 10 seconds. The run is considered complete afterstable_secsconsecutive seconds with no size change (default: 60 seconds). - Thermo
.raw: A single binary file. Checked via mtime and size. Stable afterstable_secswith no change (default: 30 seconds).
Defines QC pass/warn/fail thresholds per instrument model. A default entry applies when no model-specific entry exists.
thresholds:
default:
dia:
n_precursors_min: 5000
median_cv_precursor_max: 20.0
missed_cleavage_rate_max: 0.20
ips_score_min: 50
dda:
n_psms_min: 10000
pct_delta_mass_lt5ppm_min: 0.70
ms2_scan_rate_min: 10.0
"timsTOF Ultra":
dia:
n_precursors_min: 10000
median_cv_precursor_max: 15.0
ips_score_min: 65
dda:
n_psms_min: 30000
pct_delta_mass_lt5ppm_min: 0.90Controls community benchmark participation. No HuggingFace account or token is needed -- STAN submits through a relay API automatically.
display_name: "Your Lab Name" # shown on leaderboard; blank = anonymous
hela_source: "Pierce HeLa Protein Digest Standard"
institution_type: "core_facility" # core_facility | academic_lab | industry
error_telemetry: true # opt-in anonymous error reports (set by stan setup)STAN reads your raw files before any search and auto-detects everything it needs. You only configure two things: your LC column and your HeLa amount.
| What STAN auto-detects | Thermo .raw |
Bruker .d |
How |
|---|---|---|---|
| Instrument model | Orbitrap Fusion Lumos, Exploris 480, Astral, ... | timsTOF HT, Ultra, Pro, ... | TRFP metadata / TDF GlobalMetadata |
| Serial number | fsn20215, ... | 1895883.10878, ... | Same |
| Acquisition date | 09/02/2025 15:18:13 | 2024-06-04T15:32:57 | FileProperties / AcquisitionDateTime |
| DIA vs DDA mode | From method name + MS2/MS1 ratio | MsmsType in Frames table (8=DDA, 9=DIA) | Automatic |
| Gradient length (min) | 35, 60, 90, 120 | From Frames.Time in analysis.tdf | TRFP metadata / TDF Frames table |
| DIA window size (Th) | 22 Da, 3 Th, 4 Th, ... | From method name | Parsed from method + computed from scan ratio |
| LC system | Dionex UltiMate 3000, Vanquish Neo, Easy-nLC | Evosep One, nanoElute | Binary string scan (.raw) / hystar.method XML (.d) |
| LC pump model | NCS-3500RS, HPG-3400RS, ... | — | DriverId in embedded method XML |
| Autosampler | WPS-3000, ... | Standard | Same |
| Fragmentation type | HCD, CID | CID (TIMS-CID) | ScanSettings / method |
| Column oven temp | 40°C | — | ScanSettings |
| Injection volume | 2 µL | — | SampleData |
| Xcalibur method path | C:\Xcalibur\methods\gabri\Dia\ela_fDIAw22_35m.meth |
— | SampleData |
On Windows, STAN auto-downloads ThermoRawFileParser on first use (~10 MB, cached in ~/.stan/tools/). On Linux/HPC, it uses the system dotnet runtime. Metadata extraction takes ~3 seconds per file.
IPS v2 is a 0-100 cohort-calibrated depth score derived from 967 real UC Davis HeLa QC runs. It uses only metrics STAN reliably measures — no reference TIC, no blank runs, works from run 1. A run at its cohort median scores 60; cohort p90 scores 90.
DIA:
IPS = 50% precursor_depth + 30% peptide_depth + 20% protein_depth
Each component scored by piecewise-linear interpolation against (instrument_family, SPD_bucket) reference p10/p50/p90.
DDA: Same structure but uses PSM counts with separate per-instrument DDA cohort references.
DDA:
IPS = 30 x identification_depth + 25 x mass_accuracy
+ 20 x sampling_quality (pts/peak) + 15 x scoring_quality + 10 x digestion
| Score Range | Interpretation |
|---|---|
| 90-100 | Excellent -- instrument performing optimally |
| 80-89 | Good -- normal operating range |
| 60-79 | Marginal -- investigate soon |
| Below 60 | Investigate -- likely instrument or LC issue |
IPS is stored for every run in the local SQLite database and included in community benchmark submissions.
STAN depends on two external search engines that you install separately. Both run locally on your instrument workstation by default.
License note: STAN does not bundle, redistribute, or include any part of DIA-NN, Sage, or ThermoRawFileParser. It calls them as external subprocesses, the same way a Makefile calls
gcc. Users must install each tool separately under its own license. This is required for DIA-NN in particular because commercial use requires a paid license from Aptila Biotech or Thermo Fisher Scientific.
DIA-NN handles all DIA searches. Both Bruker .d and Thermo .raw files are passed directly to DIA-NN without conversion (DIA-NN 2.1+ has native support for both formats on Linux and Windows).
Install: Download from https://github.com/vdemichev/DiaNN/releases and add to PATH, or place the executable and set diann_path in instruments.yml.
License: DIA-NN is free for academic research use. Since STAN is designed for academic core facilities and research labs, this is the intended use case. Commercial users need to obtain a paid license separately from Aptila Biotech or Thermo Fisher Scientific — STAN does not modify the licensing terms.
Historical note: DIA-NN versions up to 1.9.1 were free for all users (academic and commercial). Starting with 1.9.2, commercial use requires a paid license while academic use remains free. DIA-NN 2.x follows the same model. STAN recommends the latest academic release.
Citation required: If STAN is useful for your work, please cite the DIA-NN paper: Demichev V, Messner CB, Vernardis SI, Lilley KS, Ralser M. Nature Methods (2020).
Community benchmark submissions use a frozen HeLa-specific empirical spectral library and a pinned FASTA (UP000005640_9606_plus_universal_contam.fasta, 21,044 entries), both hosted in the HF Dataset repository. The FASTA is auto-downloaded on first community submission or baseline run if not cached locally. MD5 hashes are verified client-side before submission.
Sage handles all DDA searches. Bruker .d files are read natively by Sage (confirmed working for ddaPASEF). Thermo .raw files require conversion to mzML via ThermoRawFileParser before Sage can process them -- this is the only conversion step in the entire STAN pipeline.
Install: Download from https://github.com/lazear/sage/releases and add to PATH, or place the executable and set sage_path in instruments.yml.
License: Sage is open source under the MIT license.
Sage includes built-in LDA rescoring that is sufficient for QC-level FDR estimation.
If you run DDA on a Thermo instrument, STAN needs ThermoRawFileParser to convert .raw to mzML before Sage can search it. This is only needed for Thermo DDA -- not for Thermo DIA (DIA-NN reads .raw natively) and not for any Bruker workflows.
License: ThermoRawFileParser is open source under the Apache 2.0 license.
For labs with SLURM cluster access, STAN can submit search jobs via SSH instead of running locally. See the HPC Guide for setup, container paths, bind mount patterns, and common errors. This includes critical gotchas about DIA-NN containers, symlinks, and invalid flags that will save you hours of debugging.
stan/
+-- pyproject.toml
+-- README.md
+-- STAN_MASTER_SPEC.md # authoritative design document
+-- CLAUDE.md # development context for Claude Code
+-- LICENSE # STAN Academic License
+-- install-stan.bat # Windows fresh install (auto-installs DIA-NN + Sage)
+-- update-stan.bat # Windows update (reinstalls from GitHub)
+-- start_stan.bat # launches dashboard + watcher + opens Chrome
+-- stan/
| +-- cli.py # CLI entry point (typer)
| +-- config.py # config loader with hot-reload
| +-- db.py # SQLite operations
| +-- setup.py # interactive 6-question setup wizard
| +-- baseline.py # retroactive QC processing from existing files
| +-- telemetry.py # opt-in anonymous error reporting
| +-- watcher/ # watchdog daemon, stability, mode detection
| +-- search/ # DIA-NN + Sage SLURM job builders
| | +-- community_params.py # frozen community search parameters
| +-- metrics/ # metric extraction, IPS, iRT, scoring
| | +-- rawmeat.py # identification-free QC from analysis.tdf (RawMeat-style)
| | +-- mobility_viz.py # ion mobility heatmap + histograms from 4DFF .features
| +-- search/
| | +-- carafe.py # Carafe2 library building (auto-triggered after DIA-NN)
| | +-- community_params.py # frozen community search parameters
| +-- gating/ # threshold evaluation, HOLD flag, queue control
| +-- community/ # HF Dataset submit/fetch/validate
| | +-- scripts/consolidate.py # nightly GitHub Actions consolidation
| +-- dashboard/ # FastAPI backend + React frontend
+-- tests/
+-- docs/
+-- .github/workflows/
+-- ci.yml # lint + test on push/PR
+-- consolidate_benchmark.yml # nightly benchmark consolidation
# Install in development mode
pip install -e ".[dev]"
# Run tests
pytest tests/ -v
# Run a single test file
pytest tests/test_metrics.py -v
# Skip integration tests (require Hive/SLURM)
pytest tests/ -k "not integration"
# Lint
ruff check stan/
# Lint with auto-fix
ruff check stan/ --fixTests marked @pytest.mark.integration require Hive SLURM access and real instrument files. They are skipped in CI and can be run manually on the HPC cluster.
| Component | Status | Notes |
|---|---|---|
CLI (stan init/setup/watch/dashboard/baseline/status/column-health/version) |
Done | All commands wired up and working |
| Watcher daemon (file stability, hot-reload config) | Done | Bruker .d and Thermo .raw stability detection |
Acquisition mode detection (Bruker .d) |
Done | Reads MsmsType from analysis.tdf |
Acquisition mode detection (Thermo .raw) |
Done | Via ThermoRawFileParser metadata |
| Local DIA-NN execution (default) | Done | Subprocess-based, community-standardized params |
| Local Sage execution (default) | Done | JSON config, Thermo mzML conversion via TRFP |
| SLURM HPC execution (optional) | Done | SSH/paramiko job submission for labs with clusters |
| Metric extraction (DIA + DDA) | Done | Polars-based, from report.parquet and results.sage.parquet |
| IPS scoring | Done | 4-component composite, 0-100 scale |
| QC gating + HOLD flag | Done | Hard gates, plain-English diagnosis |
| Column health assessment | Done | Longitudinal TIC trend analysis |
| SQLite database + migrations | Done | Stores all metrics, gate results, amount_ng, spd |
| Community validation + submission | Done | Hard gates, soft flags, asset hash verification |
| Community scoring (DIA + DDA) | Done | Percentile-based within SPD/amount cohorts |
| Instrument fingerprint (Track C) | Done | 6-axis radar, failure pattern matching |
| Nightly consolidation script | Done | GitHub Actions, recomputes cohort percentiles |
| FastAPI dashboard backend | Done | API routes for runs, trends, instruments, thresholds, submission |
| SPD-first cohort bucketing | Done | Evosep 500-30 SPD, Vanquish Neo, traditional LC |
Default config files (config/) |
Done | instruments.yml, thresholds.yml, community.yml templates |
| Test fixtures (real DIA-NN/Sage output) | Planned | tests/fixtures/ is empty -- need small real output files |
| React dashboard frontend | Planned | Only a placeholder HTML page exists |
PyPI publishing (pip install stan-proteomics) |
Planned | pyproject.toml is ready, not yet published |
| HF Dataset assets (speclibs) | Partial | FASTA uploaded + MD5 verified; spectral libraries in progress |
| HF Space public dashboard | Planned | Space repo exists but not deployed |
| Community benchmark live data | Planned | Requires spectral library uploads + first submissions |
Setup wizard (stan setup) |
Done | 6-question wizard, deduplicates instruments.yml, offers baseline at end |
Baseline builder (stan baseline) |
Done | Retroactive QC processing, auto-detect gradient/LC, pre-flight search engine tests |
Windows installer (install-stan.bat) |
Done | Auto-installs Python, DIA-NN, Sage, handles SSL/proxy, self-updates |
Windows updater (update-stan.bat) |
Done | One-click reinstall, self-updates via GitHub API (bypasses CDN cache); kills all STAN processes + venv Python subprocesses before pip; adds Defender exclusion if elevated; --target fallback copies Python package directly when Defender locks stan.exe without admin rights; restores stan.exe from backup if fallback was used |
Email reports (stan email-report) |
Done | Daily + weekly HTML reports via Resend API, cron/schtasks install |
| Error telemetry (opt-in) | Done | Anonymous error reports to HF Space relay, local log at ~/.stan/error_log.json |
| Community FASTA | Done | Frozen UniProt human + contaminants (21,044 entries), MD5-verified, auto-downloaded |
| Recursive watcher | Done | Watches subdirectories, filters events inside Bruker .d directories |
| Dashboard Config tab | Done | Remove button on instrument cards for deleting duplicates |
| Outlier detection (amount mismatch) | Planned | Flag submissions where metrics don't match declared amount/SPD |
| Failed run rejection | Planned | Block near-zero results from entering benchmark (failed injection, empty spray) |
Bruker .d XML metadata parser |
Done | Reads <N>.m/submethods.xml, hystar.method, SampleInfo.xml for authoritative SPD + Evosep detection (v0.2.56) |
validate_spd_from_metadata() |
Done | XML → MethodName → Frames.Time span fallback chain; handles PAC method names (v0.2.55) |
detect_lc_system() |
Done | Evosep vs custom detection from .d XML tree + TrayType; powers LC filter on community TIC overlay (v0.2.56) |
| Real acquisition-date preservation | Done | insert_run stores analysis.tdf AcquisitionDateTime or fisher_py CreationDate, not insertion time (v0.2.54) |
| DIA-NN filename sanitizer | Done | Junction/symlink workaround for DIA-NN's -- parsing bug that broke PAC-style filenames (v0.2.63) |
| Dashboard error boundary + null guards | Done | React ErrorBoundary + Array.isArray guards; /api/runs empty-DB graceful fallback (v0.2.62) |
stan repair-metadata [--push] CLI |
Done | Walks local DB, re-reads raw files, updates SPD/run_date/lc_system; optionally pushes to community relay (v0.2.57) |
stan fix-spds [--dry-run] CLI |
Done | Per-run SPD correction against raw-file metadata (v0.2.55) |
stan backfill-tic [--push] CLI |
Done | Multi-source TIC recovery (Bruker TDF → DIA-NN report → Thermo fisher_py) with 128-bin downsample + relay push (v0.2.65) |
stan baseline auto-TIC backfill sweep |
Done | Recovers missing TIC traces silently at startup; no manual command needed (v0.2.65) |
stan baseline multi-directory picker |
Done | Lists every configured watch dir as a numbered choice (v0.2.61) |
stan add-watch with QC filter prompt |
Done | Interactive scan/filter preview; -y / --qc-pattern / --all-files non-interactive flags (v0.2.59) |
stan add-watch recursive vendor detect |
Done | rglob with 5000-entry scan cap for nested subdirectories (v0.2.60) |
POST /api/update/{id} relay endpoint |
Done | Metadata-only whitelist (spd, run_date, lc_system, tic_rt_bins, tic_intensity, stats); used by repair-metadata --push (v0.2.57) |
| HF Space leaderboard TTL cache + snapshot_download | Done | 5 min in-memory cache; parallel submission downloads; cache-bust on /api/submit; ?refresh=1 override (server-side) |
| Community TIC LC filter (Evosep / Custom / All) | Done | Client-side filter on submissions by lc_system with inference fallback |
| Community TIC DIA/DDA separator | Done | Dropdown with DIA/DDA/Mixed options; defaults to DIA so cycle-time differences don't corrupt median shape |
downsample_trace(n_bins=128) helper |
Done | Bins arbitrary-length TIC to canonical 128-point format before local store + submission (v0.2.64) |
| Dashboard "Today's Runs" + MiniSparkline + IPS / FWHM-sec / signed mass-acc | Done | Cleaner Run History columns, per-instrument 30-run trend sparkline, unified Precursors/PSMs column (v0.2.58) |
| Ion Mobility tab | Done | Interactive 4D viewer: RT×1/K₀ heatmap (with axis ticks), 3D RT×m/z×1/K₀ scatter, filter panel (charge/m/z/RT), charge/FWHM/intensity histograms, diaPASEF window layout |
| DIA-NN report.parquet fallback for Ion Mobility | Done | All Ion Mobility panels auto-populate from report.parquet at 1%FDR — no 4DFF required |
| 3D filter panel | Done | Charge state toggles + m/z range + RT range — instant client-side filtering, no re-fetch |
| Enzyme Efficiency tab | Done | Missed cleavage distribution (0/1/2/3+), variable PTM frequencies, enzyme health summary cards from Modified.Sequence; 12-enzyme selector (Trypsin, Lys-C, Arg-C, Chymotrypsin, RChymoSelect, Krakatoa, Vesuvius, Asp-N, ProAlanase, Pepsin, Non-specific) |
GET /api/runs/{id}/enzyme-stats |
Done | Missed cleavages + modification frequencies from DIA-NN report.parquet |
| Instrument Health tab | Done | Levey-Jennings control charts (±1σ/2σ/3σ bands) for Precursors, Peak Capacity, Mass Accuracy, MS1 Signal, Dynamic Range, Points/Peak; radar fingerprint (local percentile rank); LC system breakdown table; "Understanding the Metrics" accordion |
| Immunopeptidomics tab | Done | MHC Class I/II length histogram, charge distribution, ion cloud (m/z × 1/K₀), modification bars, filterable peptide table from report.parquet at 1% FDR |
| Spectrum Viewer | Done | Theoretical b/y ions (UniMod: CAM, Ox, Phos, Deam, Acetyl, Trimethyl, Methyl) + experimental Best.Fr.Mz overlay from DIA-NN 2.x; peptide search with stripped/modified sequence matching |
| Ion cloud visualizations | Done | Tenzer-style m/z × 1/K₀ 2D scatter and Kulej-style RT × 1/K₀ 2D scatter, charge-coloured, in Ion Mobility tab |
| Community withdraw | Done | "Stop sharing" per-run button in Community tab with 2-step confirmation; marks submitted_to_benchmark=0 locally and surfaces submission_id |
| Column/sample type filters in Community tab | Done | Dropdowns for column model and sample type for apples-to-apples benchmark comparison |
| Auto 4DFF feature finding | Done | fourdff_enabled + fourdff_path in instruments.yml triggers 4DFF automatically after each .d search |
RawMeat-style identification-free QC (RAW badge) |
Done | TIC by MS level, spray stability score, dropout detection, accumulation time, pressure trace — from analysis.tdf only |
GET /api/runs/{id}/rawmeat |
Done | Serves RawMeat metrics; reads analysis.tdf directly, no search results needed |
GET /api/runs/{id}/mobility-map |
Done | RT×1/K0 binned grid (60×50 by default); falls back to DIA-NN report when no .features |
GET /api/runs/{id}/mobility-stats |
Done | Charge distribution + FWHM histogram + intensity histogram; falls back to DIA-NN report |
stan/metrics/rawmeat.py |
Done | Pure-Python, stdlib-only identification-free QC extractor for Bruker timsTOF |
stan/metrics/mobility_viz.py |
Done | Ion mobility map and histogram extraction from 4DFF .features files |
stan/metrics/mobility_diann.py |
Done | Ion mobility + enzyme stats extraction from DIA-NN report.parquet (fallback for all mobility panels) |
| Carafe2 seamless library pipeline | Done | Auto-triggers after every DIA-NN search; library auto-injected into next search; LIB badge on run card |
stan/search/carafe.py |
Done | JAR discovery, library management, background thread execution, DB update |
stan carafe CLI |
Done | List libraries, run Carafe on specific run |
stan timsplot CLI |
Done | Clones + launches timsplot Shiny app on port 8422 |
stan msbackend CLI |
Done | Installs R/Bioconductor packages, generates pre-filled R script for .d access |
| About tab with dual authorship | Done | Author cards (Brett Phinney + The Peptide Wizard), license summary, companion tool cards |
timsplot ↗ button in TIC viewer |
Done | Opens timsplot in new tab for any .d run |
| LC Traces tab | Done | Reads chromatography-data.sqlite from .d dirs; plots pump pressure, gradient, flow, column temp, TIC MS1/MS2, BPC as interactive Plotly charts |
stan/metrics/chromatography_lc.py |
Done | Blob decoder for HyStar/nanoElute LC system traces; times=float64 array, values=float32 array; downsamples to 2000 pts for dashboard |
GET /api/runs/{id}/lc-traces |
Done | Serves LC system traces from chromatography-data.sqlite; returns empty dict for non-.d or missing file |
| CCS tab | Done | CCS vs m/z scatter (coloured by charge) + per-charge CCS distribution histograms + median CCS table; 1/K₀ → CCS via Bruker tims_oneoverk0_to_ccs_for_mz DLL; falls back gracefully when DLL unavailable |
| Ion Detail 4-panel click-to-inspect | Done | Click any point in m/z×1/K₀, RT×1/K₀, or Method Coverage scatter → mzmine-style 4-panel panel: XIC (full run, blue), Summed Frame Spectrum (purple), EIM Mobilogram (±20s window, green), Frame Heatmap (raw m/z×1/K₀, amber crosshairs) |
stan/metrics/ion_detail.py |
Done | get_ion_detail single-pass XIC+mobilogram via extractChromatograms; get_frame_heatmap via readScans+indexToMz+scanNumToOneOverK0; get_frame_spectrum summed mobility scans |
GET /api/runs/{id}/ion-detail |
Done | Parallel-fetched XIC + EIM mobilogram for clicked ion (timsdata DLL) |
GET /api/runs/{id}/frame-heatmap |
Done | 2D raw m/z×1/K₀ log-intensity grid for nearest MS1 frame |
GET /api/runs/{id}/frame-spectrum |
Done | Summed frame spectrum (all mobility scans at given RT) |
| z=0 + z=+1 visibility fixes | Done | 4DFF: NULL-charge features included via (Charge >= 0 OR Charge IS NULL) + COALESCE(Charge,0) in all queries; DIA-NN: falsy-zero bug fixed (ch is not None instead of ch); z=+1 outside PASEF windows gets own teal trace in coverage + 3D scatter |
| Consistent charge colour palette | Done | z=0 gold #eab308, z=+1 teal #2dd4bf across all panels (cloud charts, coverage, 3D scatter, CCS, 4DFF modal ChargeChart) |
- Ship default config YAML templates in
config/sostan initworks out of the box - Setup wizard (
stan setup) — 6-question interactive config with deduplication and baseline offer - Windows installer (
install-stan.bat) — auto-installs Python, DIA-NN (.msi), Sage, handles SSL/proxy - Windows updater (
update-stan.bat) — one-click reinstall with self-update - Updater: kill venv Python subprocesses by path before pip (catches watcher daemon holding
stan.exe) - Updater: Windows Defender exclusion for
venv\Scriptsduring pip (fixes WinError 32 when elevated) - Updater:
--targetfallback — copiesstan/package directly to site-packages when Defender locksstan.exewithout admin rights; restoresstan.exefrom backup so dashboard can start - Updater: download via GitHub API (
application/vnd.github.v3.raw) instead ofraw.githubusercontent.comto bypass CDN caching that could serve stale scripts - Baseline builder (
stan baseline) — retroactive QC processing with auto-detect and pre-flight tests - Upload pinned human UniProt reviewed FASTA to HF Dataset (UP000005640_9606_plus_universal_contam.fasta, 21,044 entries)
- Populate MD5 hashes in
stan/community/validate.py - Auto-download community FASTA on first community submission or baseline run
- Opt-in anonymous error telemetry with local log
- Recursive watcher with Bruker .d event filtering
- Dashboard Config tab with instrument Remove button
- Bruker
.dXML method-tree parser for authoritative SPD + LC detection -
stan repair-metadataCLI for re-extracting SPD/run_date/lc_system from raw files -
stan fix-spdsCLI for per-run SPD correction against raw-file metadata -
stan backfill-tic --pushfor multi-source TIC recovery with community relay push -
stan baselineauto-TIC-backfill sweep on startup -
stan baselinemulti-directory picker -
stan add-watchwith interactive QC filter prompt + non-interactive flags -
stan add-watchrecursive vendor auto-detect for nested watch dirs - DIA-NN filename-with-
--sanitizer (junction/symlink workaround) -
POST /api/update/{id}relay endpoint with metadata whitelist - HF Space leaderboard TTL cache + snapshot_download
- Community TIC overlay: LC system filter (Evosep/Custom/All)
- Community TIC overlay: DIA/DDA separator (defaults to DIA)
- Dashboard ErrorBoundary + null-safety for empty DB
- Dashboard Today's Runs MiniSparkline + Run History column refresh
-
downsample_trace()for canonical 128-bin TIC shape - Real acquisition-date preservation from
analysis.tdf.AcquisitionDateTime - HF Dataset historical backfill: 81/83 submissions corrected with real SPD + run_date + lc_system
- Ion Mobility tab — RT×1/K₀ heatmap (axis labels/ticks), 3D scatter, charge/FWHM/intensity histograms
- Ion Mobility — DIA-NN
report.parquetfallback (no 4DFF required for any panel) - Ion Mobility — 3D filter panel (charge state toggles, m/z range, RT range, instant client-side)
- Enzyme Efficiency tab — missed cleavages (0/1/2/3+), oxidation %, modification table, health summary
- Auto 4DFF feature finding —
fourdff_enabledininstruments.ymltriggers after each.dsearch - RawMeat-style identification-free QC (RAW badge) — from
analysis.tdf, no search needed - Carafe2 seamless library pipeline — auto-trains, auto-injects, LIB badge
-
stan timsplot— launches timsplot Shiny app on port 8422 -
stan msbackend— generates pre-filled R script for MsBackendTimsTof - About tab — dual authorship (Brett Phinney + The Peptide Wizard), license, companion tools; Feature Highlights grid + What's New April 2026 changelog
- Instrument Health tab — Levey-Jennings control charts, radar fingerprint, LC breakdown, Understanding the Metrics accordion
- Immunopeptidomics tab — MHC I/II length histogram, charge distribution, ion cloud, modification bars, peptide table
- Spectrum Viewer — theoretical b/y + experimental Best.Fr.Mz overlay (★), peptide search across report.parquet
- Ion cloud views — m/z × 1/K₀ (Tenzer-style) and RT × 1/K₀ (Kulej-style) 2D scatter in Ion Mobility tab
- Enzyme selector — 12 enzyme types with direction-aware missed cleavage counting in Python backend
- Community withdraw — per-run "Stop sharing" button with 2-step confirm in Community tab
- Ion Mobility charge filter — buttons z=?/+1–+6 always visible above 4D scatter; unassigned (z=0) gold, singly-charged (+1) teal;
_MIN_CHARGE_FOR_PLOTS=0 so all charges included from 4DFF and DIA-NN - max_features raised to 15,000 (server cap 20,000) — 3× more ions visible in 4D scatter, landscape, and waterfall
- CCS tab — CCS vs m/z scatter + per-charge histograms + median CCS table; 1/K₀ → CCS via Bruker timsdata DLL
- Ion Detail 4-panel view — click any ion in m/z×1/K₀, RT×1/K₀, or Coverage charts → mzmine-style XIC + Summed Frame Spectrum + EIM Mobilogram + Frame Heatmap
- z=0/z=1 visibility fixes — 4DFF
NULL-charge features now included viaCOALESCE(Charge,0); DIA-NN falsy-zero bug fixed (ch is not None); z=+1 always-visible outside PASEF windows in own teal trace - Updater: auto-clean pip temp artifacts (
~-prefixed dirs insite-packages) left by previous failed installs before each update run - Add small real DIA-NN and Sage output files to
tests/fixtures/ - Maintenance log UI — enter column swaps/source cleans/PMs with date+notes, overlay as vertical markers on Trends charts
- Column tracking — log column installs (vendor, model, serial, install date) to explain TIC variance
- TIC filter by pseudonym (your traces vs community vs all)
- TIC color by lab when showing all traces
- Lumos/Exploris Thermo TIC backfill via Hive-side
report.parquetidentified-TIC path - Thermo
.rawfisher_py-based SPD extraction from InstrumentMethod header - Generate and upload Astral HeLa predicted spectral library to HF Dataset
- Generate and upload timsTOF HeLa predicted spectral library to HF Dataset
- Build React frontend for dashboard (run history, trend charts, community leaderboard)
- Deploy HF Space public community dashboard
- Publish to PyPI
- Outlier detection for community submissions — flag runs where metrics are wildly inconsistent with declared amount/SPD (e.g., someone declares 50 ng but IDs suggest 500 ng injection)
- Failed run rejection — detect near-zero results (failed injection, empty file, broken spray) and block them from entering the benchmark; these should never pollute cohort percentiles
- Add Thermo
.rawmode detection integration tests on Hive - Points-across-peak metric (DIA + DDA): compute median FWHM, cycle time, and data points per elution peak (quantitation quality diagnostic, per Matthews & Hayes 1976)
- Community dashboard figures: SPD vs. points-across-peak (shows the quantitation cliff), faceted/colored by LC column model
- LC column as a dimension in all community dashboard figures (color, facet, or filter)
- End-to-end watcher integration test with real instrument data
| Resource | URL |
|---|---|
| STAN GitHub | github.com/bsphinney/stan |
| STAN Community Dashboard | huggingface.co/spaces/brettsp/stan |
| STAN Community Dataset | huggingface.co/datasets/brettsp/stan-benchmark |
| DE-LIMP (sibling project) | github.com/bsphinney/DE-LIMP |
STAN handles QC and instrument health monitoring. For differential expression analysis and full quantitative proteomics workflows, see DE-LIMP.
Contributions are welcome. Please:
- Fork the repository and create a feature branch
- Run
ruff check stan/andpytest tests/ -vbefore submitting - Include tests for new functionality (use fixtures in
tests/fixtures/, prefer real output snippets over synthetic data) - Open a pull request with a clear description of the change
For questions about the spec or design decisions, open a discussion on GitHub before implementing.
Code: STAN Academic License -- free for academic, non-profit, educational, and personal research use. Commercial use (including CROs, pharmaceutical companies, and biotech) requires a separate license. Contact bsphinney@ucdavis.edu for commercial licensing inquiries.
Copyright © 2026 Brett Stanley Phinney, UC Davis Proteomics Core; 🧙♂️ The Peptide Wizard.
Community benchmark dataset: CC BY 4.0
If STAN is useful for your work, please cite STAN and the search engines it depends on:
STAN:
Phinney BS. STAN: Standardized proteomic Throughput ANalyzer. UC Davis Proteomics Core (2026). https://github.com/bsphinney/stan
DIA-NN (DIA search engine):
Demichev V, Messner CB, Vernardis SI, Lilley KS, Ralser M. DIA-NN: neural networks and interference correction enable deep proteome coverage in high throughput. Nature Methods. 2020;17:41-44. https://doi.org/10.1038/s41592-019-0638-x
Sage (DDA search engine):
Lazear MR. Sage: An Open-Source Tool for Fast Proteomics Searching and Quantification at Scale. Journal of Proteome Research. 2023;22(11):3652-3659. https://doi.org/10.1021/acs.jproteome.3c00486
Points-across-peak quantitation quality metric:
Matthews DE, Hayes JM. Systematic Errors in Gas Chromatography-Mass Spectrometry Isotope Ratio Measurements. Analytical Chemistry. 1976;48(9):1375-1382.
Carafe (experiment-specific spectral library building):
Noble Lab. Carafe: calibrated retention time and ion mobility prediction for timsTOF. https://github.com/Noble-Lab/Carafe
timsplot (timsTOF visualization):
Kirsch Z. timsplot: interactive timsTOF data visualization. https://github.com/zack-kirsch/timsplot
MsBackendTimsTof (R/Bioconductor timsTOF backend):
Rainer J, et al. MsBackendTimsTof: Bioconductor backend for Bruker timsTOF data. https://github.com/rformassspectrometry/MsBackendTimsTof