Context
At the 2026-04-21 meeting with Tim Clark and Casper of the TRACE project, they explicitly flagged compute / runtime-environment capture as their next likely TROv vocabulary increment (transcript lines 503-523). Tim:
"I think Tim you sketched out an example of something that isn't yet formalized but it'll very likely be the next point release of trace to include one version of what computing architecture operating system software or whatever you can capture that goes into the trace with calendar isn't just an idiosyncratic add on."
For PolicyEngine, this matters because:
- Stochastic imputation (QRF forests) is reproducible within a pinned numpy version but we have not guaranteed cross-numpy determinism. The TRO should record the exact Python / numpy / cloud-region combination that produced a given h5 or simulation result.
- Modal-hosted builds run on specific GPU / compute pool configurations. The build-TROs we emit today (us-data PR #746) record
pe:ciRunUrl and pe:ciGitSha but not the container image SHA, Python version, or cloud region at execution time.
- Webapp-run TROs (api#3485) will face the same gap. A CI/deploy SHA documents how a container was built, not which container was running when a specific request was served.
What to build
-
Extend pe: attestation fields to cover:
- Container image SHA (not just build commit SHA).
- Python version.
- Relevant library versions with nondeterminism risk (numpy, scikit-learn, quantile-forest, huggingface_hub).
- Cloud region / compute-pool identifier.
- Instance / pod ID at execution time.
-
Populate these at emission time both in us-data's build-TRO emission and in the webapp-run TRO emission scoped by api#3485.
-
Offer the generalized subset upstream to TROv. Some of these fields (image SHA, Python version, region) are likely useful for any statistical-agency-style use case; others (quantile-forest version) are PolicyEngine-specific.
Non-goals
- Not pinning every transitive Python dependency. TRACE has explicitly not built that in (transcript 399-403) and we should not either. Scope is to the nondeterminism-relevant subset.
- Not blocking on TRACE formalizing runtime-environment fields — we use
pe:* in the interim and migrate to TROv when ready.
Related
Context
At the 2026-04-21 meeting with Tim Clark and Casper of the TRACE project, they explicitly flagged compute / runtime-environment capture as their next likely TROv vocabulary increment (transcript lines 503-523). Tim:
For PolicyEngine, this matters because:
pe:ciRunUrlandpe:ciGitShabut not the container image SHA, Python version, or cloud region at execution time.What to build
Extend
pe:attestation fields to cover:Populate these at emission time both in us-data's build-TRO emission and in the webapp-run TRO emission scoped by api#3485.
Offer the generalized subset upstream to TROv. Some of these fields (image SHA, Python version, region) are likely useful for any statistical-agency-style use case; others (
quantile-forestversion) are PolicyEngine-specific.Non-goals
pe:*in the interim and migrate to TROv when ready.Related
docs/trace-case-study.md(PR Add TRACE case study writeup for AEA / TRACE grant team #315 — flags this as an adjacent workstream)/tmp/aea-review/transcript.txton this laptop