Skip to content

Emit institutionally-signed TRACE TRO for every webapp simulation run #3485

@MaxGhenis

Description

@MaxGhenis

Context

Today's working meeting with Lars Vilhuber (AEA Data Editor), Tara Watson, John Sabelhaus, and Tim Clark / Casper of the TRACE project reframed where a TRACE Transparent Research Object (TRO) adds value for PolicyEngine.

Lars, Tim, and Casper converged on a specific answer: TRACE's value for us is institutional certification of runs researchers cannot easily re-run themselves. For PolicyEngine, the two places that fits:

  1. The us-data microdata build (us-data PR Update PolicyEngine US to 0.462.2 #746 already emits a build-TRO per release ✓)
  2. The policyengine.org simulation runs — because the webapp runs the simulation on infrastructure and against data (the calibrated enhanced CPS .h5) that the researcher does not fully control. This issue is about that second surface.

Lars explicitly said TRACE does not primarily serve the researcher-running-policyengine-locally case — in that case the reader can just pip install the same versions and rerun without a TRO.

What to build

When a simulation runs through the policyengine-api (household-level via /calculate, or economy-wide via /economy/{country_id}/over/{policy_id}), the server should emit a signed TRACE TRO that binds:

  • Software: policyengine, policyengine-{country}, policyengine-{country}-data wheel SHA-256 + versions. Not the full installed Python-package transitive list — TRACE has explicitly not built transitive-dep tracing in (per the 2026-04-21 meeting with Tim/Casper); a verifier who wants the full environment can resolve declared dependencies against a public index.
  • Data: the HF-hosted enhanced_cps_*.h5 (or UK / Canada equivalent) SHA-256 + DataReleaseManifest SHA-256
  • Reform: the reform JSON the user submitted, content-hashed
  • Inputs: the household JSON (for /calculate) or the simulation config (for /economy/...), content-hashed
  • Results: a content-hashed results.json with the aggregate metrics we currently return to the UI. Whether to additionally bind a full per-household weighted output frame (parquet) is an open design question — see below.
  • Institutional attestation: CI/deployment run URL, git SHA, cloud region, timestamp, and a signature by a PolicyEngine service account so the TRO is verifiably "PolicyEngine ran this" not "someone with a fork ran this"

All hashes canonical-JSON normalized via policyengine.provenance.trace.canonical_json_bytes so third-party validators can reproduce them.

Storage and retrieval

Each emitted TRO is a JSON-LD document (~few KB). Persist:

  • In GCS / object storage under traces/{country}/{year}/{run-id}.trace.tro.jsonld with a durable public URL
  • Indexed in the policyengine-api database so the result page can fetch its own TRO
  • Retention: indefinite (these are citations; they must never 404)

The TRO URL lives in the simulation result JSON returned to the frontend — see the companion issue on policyengine-app for the download button and version badge UX.

Non-goals (per meeting)

Additional requirements surfaced on review

Codex review of this issue (2026-04-21, against the meeting transcript) flagged that the original scope conflates "institutional attestation metadata" with "institutional certification." They are not the same. To actually deliver the latter, this issue needs:

  • Durable storage commitment. The TRO URL in a paper citation must resolve forever. "Persist to GCS" is not a commitment; it is an implementation. The policy needs to specify: retention duration (indefinite), content-addressing scheme, a migration plan for bucket / region / provider changes, and a URL resolver that survives service rewrites. Zenodo deposit as a durable mirror is on Lars's mind (the meeting flagged HuggingFace lacking a clear preservation policy, and pointed at Zenodo as the reference pattern for this kind of artifact) — worth considering at least as a secondary location.
  • Verifier-facing trust model for the signature/key. A PolicyEngine service-account signature is only as credible as a reader's ability to verify (a) the key actually belongs to PolicyEngine, (b) it was not rotated between emission and verification without a traceable chain, and (c) the signing service itself cannot be spoofed. Possible answers: GCP workload-identity + short-lived signatures, a published keychain rooted in a DNS TXT record at policyengine.org, or a Sigstore-style transparency log. Needs an explicit design choice, not just "sign with a service account."
  • Binding to the actual production runtime and request, not merely CI/deploy metadata. The TRO should attest "this specific request, running on this specific container image / function version, at this specific time, produced these outputs." CI run URL + git SHA documents how the container was built; the TRO also needs to bind the running container image SHA + region + pod / function instance at the time of execution.

On "institutional self-certification"

The meeting transcript is explicit that an institution certifying its own runs "carries technically no difference" from an author certifying their own runs — the arms-length property is lost. Our value comes from institutional reputation and from providing structured evidence that a verifier can query, not from a cryptographic equivalent of arms-length independence. The issue should avoid language that oversells this. We are producing an institution-backed self-attestation; that is valuable and aligns with TRACE's current scope, but it is not arms-length third-party certification.

Prerequisites

  • Do not embed TRO emission in the end-user policyengine Python package as a default researcher-facing feature. policyengine trace-tro CLI already exists; that's fine for the build process and power users, but the researcher-laptop case is not where TRACE adds value.
  • Do not rebuild the TRO schema. policyengine.provenance.trace.build_trace_tro_from_release_bundle + build_simulation_trace_tro already emit canonical TROv 0.1. The work here is wrapping those in the API request lifecycle + signing + persistence.

Open design questions

  • Per-household frame default: always include in the TRO, opt-in, or opt-out? The meeting transcript does not reach consensus on this. Max raised it, Sabelhaus noted the app already produces both the full counterfactual microdata and the summary statistics, but Lars/Tim/Casper did not endorse "make the full frame the TRO default." Design choice here should be ours, made with explicit trade-offs listed: TRO file size, downstream-analysis utility, privacy posture in UK-style restricted-data cases.
  • Signing mechanism: GCP workload-identity-based signing of the TRO bytes, GPG key, or just rely on the attestation fields being bound to an auditable CI/deploy provenance chain?
  • Back-population: should we retroactively emit TROs for simulations already stored, or only emit going forward?

Dependencies

Blocker: policyengine-api needs migration to policyengine>=4.0 before this issue can be implemented. The current pin is policyengine>0.12.0,<1 (pre-v4 orchestrator), and the api imports pre-v4 modules like policyengine.simulation.SimulationOptions that do not exist in current pe.py. The TRACE emission helpers (policyengine.provenance.trace.*) only exist in v4. See the separate migration issue filed as a prerequisite.

  • Requires policyengine==4.3.1+ which exposes build_simulation_trace_tro (already shipped)
  • Requires policyengine-us-data==1.85.2+ whose DataReleaseManifest ships alongside every HF h5 upload (already shipped)
  • Requires policyengine.py PR Bump policyengine-us to 0.266.0 #314 to have merged (TROv 0.1 canonical namespace migration)

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions