satsignal.provenance.v1 — structured provenance ingest

Versioning (2026-05-20). This is satsignal.provenance.v1. The schema literal is fixed at "satsignal.provenance.v1". The shape evolves additively as v1.x: new fields are added as optional top-level keys, every existing v1.0 manifest canonicalizes to identical bytes, and unknown top-level keys remain a 400 — custom/vendor data lives in the explicit extensions object. Phase 4 (2026-05-20) lands the typed-authority block — see §8. Breaking shape changes would ship as satsignal.provenance.v2, never as a quiet v1 mutation.

Terminology. The canonical vocabulary on every surface is proof (grouped in folders); API responses emit the canonical names only (proof_id, proof_url, folder_slug). Where this spec shows receipt / matter / bundle_id, it is documenting a frozen on-disk/on-chain format, a stable filename (RECEIPTS.md), or a legacy route that remains accepted inbound — proof and receipt denote the same record. Full alias map: the compatibility map.

Current public names. The canonical vocabulary is the only one the server emits; the legacy wire tokens remain accepted inbound forever, so every existing client that sends them keeps working unchanged:

Sending both folder_slug and matter_slug with different values returns 400 conflicting_alias; identical values are accepted. The examples below use the canonical names; the full alias map is the compatibility map.

Wrapper status. POST /api/v1/provenance/anchor is not a separate proof system or a competing API. It is a thin structured wrapper around the same anchor primitive: it normalizes CI / build / package metadata into the canonical satsignal.provenance.v1 manifest, then anchors sha256(canonical manifest) as an ordinary standard anchor — mechanically identical to POST /api/v1/anchors. Nothing is stored or verified differently; mode: "provenance" is only an ingest acknowledgement (see §4).

This is the provenance-manifest implementer spec. For the user-facing overview, see satsignal.cloud/docs.html.

Status: draft 1, 2026-05-16. Audience: anyone wiring a CI job, build system, package publisher, agent runtime, or webhook into Satsignal who wants a structured record anchored — not just a raw byte blob. Goal: define one canonical manifest that every integration normalizes into, so adapters (GitLab CI, Bitbucket, Docker BuildKit, npm, PyPI, custom) stay thin and none of them invents its own shape.

1. Why this exists

Satsignal already has two ends of the provenance pipe:

satsignal.provenance.v1 is the missing middle: a small, structured, canonicalizable record (SCJ-v1, §3) that says what is being anchored, where it came from, who produced it, and which supply-chain attestations already cover it. An integration's only job is to fill this shape and POST it. The endpoint canonicalizes it, commits its SHA-256 on-chain, and ships the full canonical manifest back inside a .mbnt bundle so verification is reproducible offline — no Satsignal API call required.

Satsignal does not re-issue SLSA / in-toto / Sigstore / npm / PyPI attestations. It anchors a record that references them by digest. Keep using your supply-chain identity layer; this adds an independent, public, direct-to-chain timestamp around it.

2. The object

Canonical JSON. schema, source, and subject are required; verifiers and the ingest endpoint MUST reject if any required field is missing or malformed.

{
  "schema": "satsignal.provenance.v1",
  "source":  { "type": "gitlab", "id": "acme/widgets" },
  "subject": { "type": "commit", "digest": "sha256:9f86d0...c9c7" },

  "identity": {
    "provider": "gitlab",
    "actor": "ci-bot",
    "repo": "acme/widgets",
    "commit": "9f86d0...",
    "workflow_run": "pipeline/8123",
    "agent_id": null,
    "org_id": "acme"
  },
  "attestations": [
    { "type": "slsa", "digest": "sha256:1b4f0e...8a21" }
  ],
  "claims": { "stage": "release", "artifact": "widgets-1.4.2.tgz" },
  "privacy": { "onchain_mode": "hash_only", "public_fields": [] }
}

Required

FieldTypeMeaning
schemastringMUST be the exact literal satsignal.provenance.v1.
source.typestringThe emitting system. One of: github, gitlab, bitbucket, docker, npm, pypi, langfuse, langsmith, otel, s3, webhook, custom. Use custom for systems not yet enumerated.
subject.typestringWhat the digest covers. One of: commit, artifact, container, image, package, trace, prompt, file, webhook, release, eval, custom.
subject.digeststringsha256:<64 lowercase hex> (a bare 64-hex string is accepted and normalized to the prefixed form). Only sha256 in v1.

Optional

FieldTypeMeaning
source.idstringProvider-scoped identifier of the source (repo path, registry name, bucket, …).
identityobjectOpen string→string bag binding the run context: provider, actor, repo, commit, workflow_run, agent_id, org_id, or any adapter-defined key. Keys/values are length-capped and control-character-rejected.
attestationsarrayEach { "type": ..., "digest": "sha256:..." }. typeslsa, in-toto, github, npm, pypi, cosign, sigstore, custom. The digest points at an attestation produced elsewhere; Satsignal anchors the pointer, it does not validate the attestation.
claimsobjectOpaque caller-defined map. Satsignal does not interpret it; it only commits it. Must be JSON with no floats (use strings or integers), bounded depth.
privacyobjectonchain_mode (hash_only default, or sealed — see §6) and optional public_fields (advisory list of dotted paths the caller considers non-sensitive). In sealed mode the manifest is HMAC-blinded client-side and never reaches the server (§6.1).

v1.x additive top-level keys (typed-authority block + extensions) are specified in §8. They are all OPTIONAL — an emitter that doesn't opt in produces identical canonical bytes to v1.0.

Unlike the chain-anchor-v1 embed envelope, unknown top-level keys are rejected, not ignored: this is an ingest endpoint, and the standing rule is that a misused anchor must fail loud (a 400), never quiet (an on-chain spend under a wrong shape). Custom/vendor data goes in the explicit extensions object (§8); arbitrary new top-level keys remain INVALID for canonical Satsignal semantics. Keys inside claims and inside extensions are caller-defined and intentionally open.

3. Canonicalization

The manifest is canonicalized with Satsignal Canonical JSON v1 (SCJ-v1) — the same rule used for the MBNT canonical doc (see /spec-mbnt §Canonicalization) and the manifest-items-v1 leaf preimage. SCJ-v1 is:

Re-running on a structurally-equivalent manifest (different key order, equivalent Unicode) yields identical bytes — the property the on-chain proof depends on.

SCJ-v1 is NOT RFC 8785 (JCS). Do not reach for an RFC 8785 / JCS library to verify a provenance manifest or any MBNT canonical doc — it will compute a different hash on two inputs and the anchor will appear to fail. The two rules diverge in two places: (1) SCJ-v1 sorts keys by code point (Python str order), whereas RFC 8785 §3.2.3 sorts by UTF-16 code unit — these differ for supplementary-plane ("astral") keys (U+10000+); and (2) SCJ-v1 NFC-normalizes strings, which RFC 8785 does not. RFC 8785 JCS is used only by the selective-disclosure field profile (satsignal.json.field.v1, see /spec-disclosure), which is a distinct canonicalization context — do not assume the two are interchangeable. A reference SCJ-v1 implementation is notary/canonical.py (Python) and verifier/canon.mjs (JS); the cross-language byte-parity corpus (tests/vectors/provenance-v1/canonical_corpus.json) pins both, including the astral-key and NFC edge cases.

manifest_sha256 = sha256(scj_v1(normalized_manifest))

That hash is committed on-chain as a byte_exact standard anchor — mechanically identical to anchoring a file whose bytes are the canonical manifest.

4. Endpoint

POST /api/v1/provenance/anchor
Authorization: Bearer sk_...            (scope: anchors:create)
Content-Type: application/json

{ "folder_slug": "<slug>",
  "manifest":    { ...satsignal.provenance.v1... },
  "label":       "optional",
  "category":    "optional, default evidence_bundle",
  "session_id":  "optional off-chain grouping key",
  "force_new":   false }

Response 200:

{
  "proof_id": "…",
  "txid": "…",
  "mode": "provenance",
  "category": "evidence_bundle",
  "dry_run": false,
  "manifest_hash": "<64-hex sha256 of the canonical manifest>",
  "chain_anchor": { "v": 1, "system": "satsignal", "chain": "bsv-mainnet",
                    "txid": "…", "root_hash": "<64-hex>",
                    "category": "evidence_bundle", "anchor_id": "<proof_id>",
                    "workspace": "<slug>", "manifest_sha256": "<64-hex>" },
  "folder_slug": "…",
  "proof_url": "https://app.satsignal.cloud/w/…/r/…",
  "bundle_url": "https://app.satsignal.cloud/bundle/….mbnt",
  "retain_until": 1780704000
}

retain_until is Unix epoch seconds (UTC), never 0 — but for a plaintext provenance proof the value is bookkeeping, not a deletion schedule: the server-side copy is stored until you delete the proof (dashboard or request) — no automatic expiry is applied (retention policy: /dpa.html §04). (Sealed provenance follows the sealed retention model instead — §6.4: indefinite by default, retain_until: null.)

mode: "provenance" here is an ingest acknowledgement for the dedicated provenance endpoint — it is not a persisted anchor mode. The proof is stored as a standard-category anchor over sha256(canonical provenance manifest), so a later GET /api/v1/anchors, GET /api/v1/proofs/<id> (legacy /api/v1/receipts/<id>), or folder listing reads this same proof back as "mode": "standard". That is expected, not a bug. Detect a provenance proof from the manifest (schema: "satsignal.provenance.v1", its category, or the bundle's sidecar metadata) — never from mode alone. The load-bearing cryptographic facts are manifest_hash and the chain_anchor envelope below, not the POST mode label.

chain_anchor is a ready-to-embed chain-anchor-v1 object — drop it verbatim into a downstream proof and any chain-anchor-v1 verifier resolves it with no further work.

Idempotency. An identical canonical manifest re-POSTed to the same folder returns the prior proof ("duplicate": true) instead of creating a second anchor. force_new: true opts out (and counts against quota). Errors: 400 validation, 404 folder miss, 429 quota.

5. How to verify (offline, stdlib-only)

This section is the hash_only verification walk (the default mode), where the manifest plaintext travels in the bundle. For sealed provenance — where the bundle carries no manifest plaintext and the holder presents (manifest + salt) out-of-band — see §6.6 instead.

The .mbnt bundle's proofs.json carries { scheme: "satsignal.provenance.v1", manifest, manifest_sha256, canonical_len }.

  1. Re-canonicalize proofs.json.manifest (SCJ-v1, §3). SHA-256 it. It MUST equal proofs.json.manifest_sha256. Mismatch → reject.
  2. Read canonical.json (the proof's canonical doc). Its subject.proofs.byte_exact.hash MUST equal that same SHA-256. Mismatch → the bundle's manifest is not the thing that was anchored → reject.
  3. Fetch txid from any node/explorer. Extract the on-chain doc_hash (per SPEC_mbnt.md). It MUST equal sha256(canonical.json)[:40]. Below confirmation policy → pending.
  4. Satisfied → the provenance manifest existed in that exact form at or before the block time of txid.

What this proves

That whoever anchored it knew this exact manifest by that block time. It does not prove the referenced subject/attestation existed before then, nor that any embedded digest is itself valid — verify those through their own systems (Sigstore, SLSA, the registry). Satsignal supplies the independent, public timestamp around them. The canonical statement of the general framing — anchoring proves anchorer-knowledge by time T, not world-existence and not authorship — is in the bundle spec.

6. Privacy modes

privacy.onchain_mode selects how much the bundle reveals. Two modes ship in v1; both anchor the same on-chain shape (a byte_exact standard anchor over sha256(canonical_doc)[:20]) and are indistinguishable on chain.

onchain_modeManifest plaintextRe-derivable byDefault
hash_onlytravels in the .mbnt bundle (proofs.json.manifest), server-storedanyone who holds the bundleyes
sealednever reaches the server, not in the bundle — HMAC-blinded client-side; the manifest is the holder's bearer secretonly a holder who presents (manifest + salt) out-of-bandno (additive opt-in)

hash_only is the default and is unchanged: the canonical manifest's SHA-256 goes on-chain; the manifest itself travels in the .mbnt bundle (carried as proofs.json.manifest), which is access-controlled like any other Satsignal bundle. An unrecognized onchain_mode value is still rejected with a 400 (not a generic unknown-field error).

6.1 Sealed mode — privacy goal

Sealed provenance exists for the case where the existence and content of the provenance record is itself sensitive. Its single load-bearing property: the hosted server never holds the manifest plaintext. The client HMAC-blinds the manifest before it leaves the browser/process; only the salted commitment crosses the wire. This is the same bearer-secret / present-out-of-band model as sealed file and sealed disclosure anchors — see the sealed-anchor spec for the threat model and unsealing model, which apply verbatim to a sealed manifest.

A sealed manifest self-declares by carrying privacy.onchain_mode: "sealed" in its canonical bytes. That declaration is part of the SCJ-v1-canonicalized bytes the commitment is computed over (§6.3), so it is cryptographically bound to the record — not a transport flag.

6.2 Submit shape (sealed)

A sealed anchor POSTs the commitment block in place of manifest — the server receives a salted commitment, never the manifest:

POST /api/v1/provenance/anchor
Authorization: Bearer sk_...            (scope: anchors:create)
Content-Type: application/json

{ "folder_slug":           "<slug>",
  "byte_exact_commitment": "<64-hex HMAC-SHA256>",
  "salt_b64":              "<base64url(32-byte master salt), unpadded — omit for blind>",
  "file_size":             <int>,
  "retain_days":           30,           // salt present ⇒ mirror; omit for indefinite retention (the default) — see §6.4
  "label":                 "optional",
  "category":              "optional, default evidence_bundle",
  "session_id":            "optional off-chain grouping key",
  "sha256_hex":            "optional" }

There is no manifest field on a sealed submit — sending one declaring onchain_mode: "sealed" to the plaintext route is rejected (§6.7). The legacy matter_slug alias remains accepted inbound exactly as in §4.

6.3 Commitment

master_salt              = 32 random bytes (client-side, crypto.getRandomValues)
salt_b64                 = base64url(master_salt), padding stripped
canonical_manifest_bytes = SCJ_v1(normalized_manifest)     (UTF-8 NFC, codepoint-sorted keys, no whitespace — §3; NOT RFC 8785)
byte_exact_commitment    = HMAC-SHA256(master_salt, canonical_manifest_bytes)   (64-hex)

The canonicalization of canonical_manifest_bytes is identical to the hash_only path (§3); the only difference is that sealed HMAC-keys those bytes under master_salt instead of plain-SHA-256'ing them. The onchain_mode: "sealed" declaration is inside those bytes, so the commitment binds the mode.

6.4 Retention

Sealed provenance reuses the sealed-file retention model (see the sealed-anchor spec) — selected by salt_b64 + retain_days, not by a separate flag:

ShapeRequestServer retainsBundle delivery
blindomit salt_b64; retain_days 0/absentnothing — the salt never enters the processclient assembles the .mbnt locally from the response
mirror (default retention)salt_b64; omit retain_daysthe .mbnt indefinitely — until the proof owner deletes it (retain_until: null in the response)re-download from bundle_url any time the proof lives
mirror (explicit window)salt_b64 + retain_days >= 1the .mbnt until the window lapses; the value is honored as given on every plan (no plan-tier ceiling)re-download from bundle_url until retain_until, then reaped

The chain anchor is permanent regardless of tier; the holder's local .mbnt is the durable artifact.

6.5 Bundle / proofs.json shape (sealed)

A sealed provenance .mbnt carries the sealed proof set and the salt, and crucially NO manifest plaintext key (contrast the hash_only bundle, whose proofs.json carries "manifest": <clean manifest> — §5). The sealed byte_exact commitment is carried in the bundle's canonical.json (subject.proofs.byte_exact):

// canonical.json subject.proofs (sealed) — note: NO "manifest" anywhere
{
  "byte_exact": {
    "algo":         "hmac-sha256",
    "salt_version": "salt_v1",
    "commitment":   "<64-hex>"
  }
}

The master_salt rides in the .mbnt manifest.json as salt_b64 with bearer_secret: true (same bearer-secret flag and renderer warning as sealed file bundles). On chain the record is an ordinary byte_exact sealed anchor over sha256(canonical_doc)[:20]indistinguishable from a sealed file anchor, and introducing NO new merkle scheme literal. The manifest plaintext is absent by design: it is the holder's bearer secret, not a bundle field.

6.6 Verification (out-of-band — the key difference from hash_only)

hash_only provenance verifies by re-canonicalizing the manifest from the bundle (§5 step 1). Sealed cannot — there is no manifest plaintext in the bundle. Instead the holder presents (manifest + salt) out-of-band, and the verifier:

  1. Re-normalize + SCJ-v1-canonicalize the presented manifest (§3) → canonical_manifest_bytes.
  2. Compute HMAC-SHA256(salt, canonical_manifest_bytes) and confirm it equals the bundle's canonical.json byte_exact.commitment. Mismatch → the presented manifest is not what was sealed → reject.
  3. Re-canonicalize canonical.json, SHA-256, truncate to 20 bytes; confirm it equals manifest.doc_hash_expected and walk it to the on-chain doc_hash per §5 steps 3–4.

A match on all three means: at the block time of txid, whoever anchored it provably knew this exact manifest, and the verifier learned it only because the holder chose to disclose (manifest + salt). Without that disclosure the chain reveals only that some provenance record was anchored at time T.

6.7 Guard — plaintext route rejects a sealed manifest

The plaintext provenance route (a manifest-bearing POST, §4) rejects any manifest declaring onchain_mode: "sealed", error sealed_manifest_on_plaintext_route. A sealed manifest MUST be HMAC-blinded client-side and submitted via the §6.2 commitment shape — it must never be POSTed as plaintext (the server must never SHA-256 and store a manifest that asked to be sealed).

6.8 Coexistence

hash_only (plaintext, publicly re-derivable, server-stored manifest) stays the default and is unchanged. Sealed is purely additive and fully back-compatible: an emitter that does not set onchain_mode: "sealed" produces an identical hash_only record.

7. Adapters

Every integration is a thin translator into this object. The first external adapter is GitLab CI — a bash-only (sha256sum + curl

include:
  - remote: 'https://satsignal.cloud/gitlab-ci.satsignal.yml'

anchor:
  extends: .satsignal_anchor
  variables:
    SATSIGNAL_SUBJECT_PATH: dist/app-1.4.2.tgz

It fills source / subject / identity / attestations from GitLab's predefined variables (CI_PROJECT_PATH, CI_COMMIT_SHA, CI_PIPELINE_ID, CI_JOB_ID, runner context, optional SLSA/cosign attestation digest), POSTs §4, and saves the .mbnt as a job artifact. The file's header is its own usage guide.

The rest of the CI/registry cluster now ships — each the same thin translator (bash-only sha256sum + curl + jq, no SDK, no Satsignal account at verify time), differing only in where it reads its native metadata:

Docker / npm / PyPI are standalone scripts (registry- and build-tool-level, not a CI-platform pipeline file); Bitbucket mirrors GitLab's pipeline-file form. Each file's header is its own usage guide; all map only onto source / subject / identity / attestations and change nothing server-side.

8. Typed-authority block (v1.x additive, Phase 4 / 2026-05-20)

Eleven optional top-level fields land the NIST NCCoE software-agent identity/authorization shape on top of v1, plus an extensions object as the explicit vendor escape hatch. Every field is OPTIONAL: an emitter that does not opt in produces identical canonical bytes to a v1.0 manifest, so the on-chain hash is unchanged. Every field is also payload-free — either a small typed identifier or a digest pointer at a document that lives elsewhere. The actual credentials, policy documents, or signature blobs never travel through this manifest.

{
  "schema": "satsignal.provenance.v1",
  "source":  { "type": "github", "id": "acme/widgets" },
  "subject": { "type": "release", "digest": "sha256:..." },

  "authority":    { "type": "organization",   "id": "acme",
                    "name": "Acme Corp" },
  "principal":    { "type": "service-account","id": "deploy-bot@acme" },
  "organization": { "type": "company",        "id": "acme" },
  "agent":        { "type": "ci-runner",      "id": "gha-2.317",
                    "name": "GitHub Actions" },

  "delegation_grant_digest": "sha256:...",
  "scopes":                  ["release", "publish"],
  "policy_snapshot_digest":  "sha256:...",

  "run_scope":      { "type": "workflow", "id": "release.yml@v1.4.2",
                      "environment": "prod" },
  "capture_policy": { "type": "spans", "digest": "sha256:..." },

  "artifact_roles": [
    { "role": "input",  "subject_ref": "sha256:..." },
    { "role": "output", "subject_ref": "dist/widgets-1.4.2.tgz" }
  ],

  "signature_ref": { "type": "cosign", "digest": "sha256:...",
                     "location": "ghcr.io/acme/widgets:1.4.2.sig" },

  "extensions": {
    "com.acme.build": { "matrix": ["linux/amd64", "linux/arm64"] }
  }
}

Identity fields (shape {type, id, name?})

Fieldtype enumMeaning
authoritydeveloper, organization, ci, operator, third-party, customWho is asserting authority over this run (whose policy/grants cover it).
principaluser, service-account, agent, customOn whose behalf the action ran.
organizationcompany, team, project, namespace, customOwning organization / namespace.
agentci-runner, build-bot, publisher, llm-agent, human-operator, customSoftware agent that produced the artifact. Distinct from principal (who) and identity.actor (free-form).

id is required and is the stable identifier (path, URI, account ID, …); name is an optional human-readable label. Both are length-capped and harness-checked. custom is the escape hatch for entities not yet enumerated.

Digest-only fields

FieldShapeMeaning
delegation_grant_digestsha256:<64-hex>sha256 of the delegation / capability grant that authorized this run. Pointer only — the grant document is held by the issuer.
policy_snapshot_digestsha256:<64-hex>sha256 of an effective-policy snapshot. Pointer only.

Scope and capture

FieldTypeMeaning
scopesarray<string>Short tags naming the actions authorized for this run (e.g. ["release", "publish"]). Open vocab; up to 32 entries; harness-checked.
run_scopeobject{type, id, environment?} — the intended boundary of the run. typeworkflow, deployment, session, task, build, evaluation, custom.
capture_policyobject{type, digest?} — the events-capture rule the run was running under. typeevents, spans, metrics, all, custom. Optional digest pointer at the policy document.

run_scopecapture_policy: the first describes what is in scope, the second describes which events must be captured / anchored within that scope.

Artifact roles

FieldTypeMeaning
artifact_rolesarray<object>Each {role, subject_ref} tags a secondary artifact with its function in the run. roleinput, output, intermediate, producer, consumer, primary, custom. subject_ref is a free-form string (a name, URI, or digest the caller's verifier knows how to resolve). Up to 32 entries.

Outside signature reference

FieldShapeMeaning
signature_ref{type, digest, location?}Pointer at an OUTSIDE / customer signature. typecosign, jws, verifiable-credential, x509, pgp, ssh, custom. digest is the sha256 of the signature artifact; location is an optional URI.

Important. signature_ref is not a Satsignal operator signature. Satsignal anchors the pointer; the signature itself, and any signing key material behind it, lives entirely outside this manifest. Verification reports remain the unsigned-deterministic kind — an operator signature, if ever added, is a convenience artifact, not the reason the proof is trusted.

Vendor escape hatch — extensions

FieldTypeMeaning
extensionsobjectCaller-defined namespace → opaque JSON. The explicit unknown-data escape hatch: anything Satsignal does not interpret goes here.

Same canonicalization rules as claims: no floats (use strings or integers), bounded depth (≤ 6), every string leaf harness-checked and length-capped, up to 16 namespace keys at the top level. Top-level keys SHOULD be reverse-DNS namespaced (e.g. com.example.feature) to avoid cross-vendor collisions; this is convention, not enforced.

A conforming verifier MAY hash + preserve the canonical bytes of extensions but MUST label them as "not interpreted by Satsignal." A claim that lives inside extensions is never elevated to a Satsignal canonical claim.

Why typed fields and extensions

The split is deliberate. identity (v1.0) is an open string→string bag — useful for adapter-specific context that doesn't need to be interpreted by a verifier. The typed-authority block names the small set of concepts Satsignal will interpret (and treat consistently across emitters): who, on whose behalf, under what authority, with what scope, under which policy, with what outside signature. Anything outside that interpreted set goes through extensions, with the explicit "not interpreted by Satsignal" label so callers and verifiers know what level of validation is in play.

This block ships under the additive-update policy: new fields arrive as optional keys, and breaking changes ship as a new major version rather than a quiet v1 mutation.

Questions about this specification? Email hello@satsignal.cloud.