Headless redaction — build a disclosure offline, no upload, no key

Satsignal's Disclosure Redaction Tool turns (an original file + its anchor .mbnt) into a validated redacted copy plus a disclosure .mbnt whose revealed units prove into the anchor's committed merkle root (whose 20-byte doc_hash is what's on chain). The hosted version of the tool lives in your workspace dashboard — sign in, open the folder, then Disclosure Redaction Tool (/w/<workspace>/m/<folder>/disclosure-builder) — and runs entirely in your browser; this guide runs the same pure modules under Node, so you can redact programmatically — in CI, in a batch job, behind your own air-gap. There is deliberately no hosted redaction endpoint: redaction needs the original file bytes (and, for sealed proofs, the master salt), and uploading those would break the privacy boundary the whole feature exists to protect.

Companion docs: API reference · Disclosure spec · Bundle spec — canonical schemes · Sealed mode · What to hash · Compatibility map

1. The 60-second framing

You already anchored a file — a CSV, a newline-delimited log, or a JSON object — in standard or sealed mode. The anchor's .mbnt records a per-unit merkle root in the bundle (one leaf per CSV row / text line / JSON key); its 20-byte doc_hash is what's anchored on chain. Later you want to hand someone a partial copy: reveal a few units, withhold the rest, and let them verify that what you revealed is exactly what you anchored — without showing them the withheld units and without re-anchoring.

That's a disclosure. This guide builds one locally:

(original file bytes)  +  (source .mbnt)   ──►   redacted copy  +  disclosure .mbnt
        local                  local                  local            local

Nothing is uploaded. There is no API key and no network call in the redaction step. The original file, the per-leaf salts (sealed), and your reveal selection never leave the machine.

Why no hosted endpoint

Anchoring is a different shape: when you anchor, you send the notary only a sha256 (or, sealed, an HMAC) plus per-chunk hashes — never the bytes. The notary can't reconstruct your file from those. Redaction is the opposite: it needs the original bytes to recompute every leaf and prove the revealed ones, and a sealed source additionally needs the master salt (the bearer secret). A hosted redaction service would have to receive both — exactly the two things the proof system is designed never to transmit. So redaction is client-side, forever. This guide is how you run that client side without a browser.

2. Prerequisites

`` https://proof.satsignal.cloud/static/disclosure-builder/redact-core.mjs https://proof.satsignal.cloud/static/disclosure-builder/bundle.mjs https://proof.satsignal.cloud/static/disclosure-builder/verify-disclosure.mjs https://proof.satsignal.cloud/static/disclosure-builder/merkle.mjs https://proof.satsignal.cloud/static/disclosure-builder/hex.mjs https://proof.satsignal.cloud/static/disclosure-builder/base64.mjs https://proof.satsignal.cloud/static/disclosure-builder/preimage.mjs https://proof.satsignal.cloud/static/disclosure-builder/csv-row-v1.mjs https://proof.satsignal.cloud/static/disclosure-builder/csv-row-v1-native.mjs https://proof.satsignal.cloud/static/disclosure-builder/csv-column-v1-native.mjs https://proof.satsignal.cloud/static/disclosure-builder/text-line-v1-native.mjs https://proof.satsignal.cloud/static/disclosure-builder/text-tree-v1-native.mjs https://proof.satsignal.cloud/static/disclosure-builder/json-keypath-v1-native.mjs https://proof.satsignal.cloud/static/disclosure-builder/json-ast-v1-native.mjs ``

Your source file type selects which chunk_merkle.scheme / leaf rule the carrier uses at run time — but all are imported, so all must be present on disk:

source file typecarrier chunk_merkle.schemeleaf module
CSV by row (header + data rows)csv-row-v1csv-row-v1-native.mjs
CSV by column (header excluded)csv-column-v1csv-column-v1-native.mjs
newline-delimited texttext-line-v1text-line-v1-native.mjs
JSON object (key/value)json-keypath-v1json-keypath-v1-native.mjs

These modules import each other with relative specifiers (e.g. ./hex.mjs, and csv-row-v1-native.mjs pulls in csv-row-v1.mjspreimage.mjs); fetched together into one directory, every relative import resolves. Pin the bytes and re-fetch on a known version. redact-core.mjs is the only entry point you import directly; the rest are pulled in transitively.

> app.mjs, disclosure-pack.mjs, and match-candidates.mjs are > browser-UI glue (file pickers, download triggers, the > render-mode toggle, candidate matching). You do not need them > headless — this guide reproduces what they do in plain Node, and > they are the only .mjs files in the directory you can skip.

3. What you read out of the source .mbnt

An .mbnt is a ZIP. Open it and pull three members:

memberwhat you take from itpassed as
canonical.jsonthe whole file, byte-for-byte — its sha256 is the on-chain document hashcanonicalJson (parsed) + keep the raw bytes for verify
proofs.jsonmerkle_leaves (the committed per-unit leaf hashes)proofsJson
manifest.jsontxid + bundle_id (or proof_id) → the anchor reference; sealed only: salt_b64 → the 32-byte master saltanchorRef + masterSaltBytes

bundle_id is REQUIRED by the redaction core. Pass a non-empty string or buildRedactDisclosure throws RedactBindingError: anchorRef.bundle_id must be a non-empty string. (bundle_id is the proof id's spelling in the frozen on-disk .mbnt format.) Server-emitted source manifest.json files carry only txid — no bundle_id/proof_id — so you must read the value from your anchor API response's proof_id (the POST /api/v1/anchors result) and pass it as anchorRef.bundle_id. Keep that proof_id from anchor time; you cannot recover it from a server source bundle. Cryptographically the disclosure still binds via txid + the committed merkle root (anchored on chain as its 20-byte doc_hash); bundle_id is recorded in linked_anchor and the core requires it present.

Carrier is verbatim. canonical.json is the bytes whose sha256 (sliced to the on-chain commit width) is anchored. Never re-serialize it — read the raw bytes, parse a copy for canonicalJson, and keep the originals for the on-chain comparison and for the disclosure bundle's carrier member.

4. The core call

Everything routes through one function in redact-core.mjs:

const out = await buildRedactDisclosure({
  originalFileBytes,      // Uint8Array — the raw original file
  canonicalJson,          // parsed source canonical.json
  proofsJson,             // { scheme, merkle_leaves:[...], metadata:{...} }
  selectedLeafIndices,    // number[] — 0-based unit indices to REVEAL
  anchorRef,              // { txid, bundle_id } — bundle_id REQUIRED (= proof_id from the anchor response; see §3)
  // masterSaltBytes,     // Uint8Array(32) — REQUIRED iff source is sealed
  // renderMode,          // "drop" | "mask" — json-keypath-v1 only
});

What it does, in order:

  1. Recomputes the native leaves from originalFileBytes using the leaf rule the carrier's chunk_merkle.scheme pins (and, for a sealed carrier, the HKDF per-leaf-salt + HMAC rule keyed by your master salt).
  2. Hard-fails (RedactBindingError) if the recomputed leaves or the recomputed duplicate-last root don't match what the bundle commits — i.e. the file and the .mbnt aren't a matching pair, the file was edited, or (sealed) the master salt is wrong.
  3. Builds a duplicate-last proof path for each revealed unit.
  4. Assembles a satsignal.disclosure.v1 block (linked_anchor + revealed[] + claims + presentation).
  5. Renders the redacted copy bytes (revealed units in place, withheld units replaced by the marker / dropped).

Returns:

out.disclosureBlock      // the satsignal.disclosure.v1 block (-> bundle manifest)
out.redactedCopyBytes    // Uint8Array — the redacted view to ship
out.rootHex              // the committed merkle root it bound to (on chain as its 20-byte doc_hash)
out.dataRows, out.leafHashes

A bad selection (empty, all-units, or out-of-range index) also raises RedactBindingError — catch it and surface the message.

5. Worked example — standard CSV

A worked run for AcmeCorp's employees.csv, anchored earlier by ResearchAgent in standard csv-row-v1 mode. Reveal rows 0 and 2, withhold the rest.

// redact.mjs  —  node redact.mjs employees.csv employees.mbnt
import { execSync } from "node:child_process";
import { readFileSync, writeFileSync, mkdtempSync } from "node:fs";
import { tmpdir } from "node:os";
import { join } from "node:path";

// Vendored disclosure-builder modules (see §2).
import { buildRedactDisclosure } from "./disclosure-builder/redact-core.mjs";
import { buildMbnt } from "./disclosure-builder/bundle.mjs";
import { verifyDisclosureCore } from "./disclosure-builder/verify-disclosure.mjs";

const [origPath, mbntPath] = process.argv.slice(2);

// 1. Read the original file bytes.
const originalFileBytes = new Uint8Array(readFileSync(origPath));

// 2. Unzip the source .mbnt and read its members.
const dir = mkdtempSync(join(tmpdir(), "mbnt-"));
execSync(`unzip -o ${JSON.stringify(mbntPath)} -d ${JSON.stringify(dir)} >/dev/null`);
const carrierBytes = new Uint8Array(readFileSync(join(dir, "canonical.json")));
const canonicalJson = JSON.parse(Buffer.from(carrierBytes).toString("utf8"));
const proofsJson = JSON.parse(readFileSync(join(dir, "proofs.json"), "utf8"));
const manifest = JSON.parse(readFileSync(join(dir, "manifest.json"), "utf8"));

// 3. Anchor reference. `txid` comes from the source manifest;
//    `bundle_id` is REQUIRED (non-empty) by the core (its on-disk
//    spelling of proof_id). Server source manifests carry ONLY `txid`,
//    so supply the `proof_id` you saved from the POST /api/v1/anchors
//    response. Cryptographic binding is txid + on-chain root.
const PROOF_ID = process.env.SATSIGNAL_PROOF_ID; // saved from the anchor response
const anchorRef = {
  txid: manifest.txid,
  bundle_id: manifest.bundle_id || manifest.proof_id || PROOF_ID,
};

// 4. Build the disclosure — REVEAL rows 0 and 2, withhold the rest.
const out = await buildRedactDisclosure({
  originalFileBytes,
  canonicalJson,
  proofsJson,
  selectedLeafIndices: [0, 2],
  anchorRef,
});

// 5. Write the redacted copy. Extension comes from the presentation
//    format the core chose for this profile (csv -> .csv).
const fmt = out.disclosureBlock.presentation.format; // "csv" | "txt" | "json"
const ext = { csv: "csv", txt: "txt", json: "json" }[fmt] || "txt";
const base = origPath.replace(/\.[^.]+$/, "");
writeFileSync(`${base}.redacted.${ext}`, out.redactedCopyBytes);

// 6. Assemble the disclosure .mbnt: the disclosure block as the
//    manifest (JCS-canonical) + the carrier carried VERBATIM at
//    linked_anchor/canonical.json. NOTHING ELSE — no source
//    proofs.json, no source manifest.json (see §7).
const manifestBytes = new TextEncoder().encode(
  jcsCanonicalize({ disclosure: out.disclosureBlock, mbnt_version: "2.0" })
);
const zipBytes = buildMbnt({
  manifest: manifestBytes,
  linkedAnchorCanonical: carrierBytes,
});
writeFileSync(`${base}.disclosure.mbnt`, zipBytes);

// 7. (Optional) self-verify — same core the public /verify runs.
const onChainCommit = await sha256Hex(carrierBytes); // 64-hex; /verify uses the 40-hex on-chain slice
const result = await verifyDisclosureCore(out.disclosureBlock, {
  carrierBytes,
  onChainCommit,
  viewBytes: out.redactedCopyBytes,
});
console.log("verify:", JSON.stringify(result)); // { ok: true, fail_code: null }

// --- helpers -------------------------------------------------------
// Minimal JCS: sorted keys, compact separators. Matches the browser
// glue + the disclosure bundle's canonical manifest shape exactly.
function jcsCanonicalize(value) {
  return JSON.stringify(value, (_k, v) => {
    if (v && typeof v === "object" && !Array.isArray(v)) {
      const s = {};
      for (const k of Object.keys(v).sort()) s[k] = v[k];
      return s;
    }
    return v;
  });
}
async function sha256Hex(bytes) {
  const d = await crypto.subtle.digest("SHA-256", bytes);
  return [...new Uint8Array(d)].map((b) => b.toString(16).padStart(2, "0")).join("");
}

The two output files — employees.redacted.csv and employees.disclosure.mbnt — are what you hand to the counterparty. They verify the pair against the existing on-chain anchor; you never re-anchor.

The buildMbnt argument shape above ({ manifest, linkedAnchorCanonical } only) is exactly what the browser glue and the bundle round-trip contract test assemble — mirror it precisely. Supplying a root canonical or a proofs member to a disclosure bundle is a bug (see §7).

6. Text and JSON sources

The flow is identical — only the leaf module and a couple of knobs change.

text-line-v1 (newline-delimited log/text, one leaf per line, no header). Import text-line-v1-native.mjs alongside redact-core.mjs; selectedLeafIndices are line indices; the redacted copy is .txt with withheld lines replaced by [REDACTED] in position.

json-keypath-v1 (a JSON object, one leaf per top-level key, keys sorted). Import json-keypath-v1-native.mjs; selectedLeafIndices index the sorted key list. JSON adds a presentation choice via renderMode:

renderModeredacted copy containsmarkerproof
"drop" (default)only the revealed keys; withheld keys absent(key omitted)identical
"mask"all keys; withheld → "key":"[REDACTED]"[REDACTED]identical
import { buildRedactDisclosure } from "./disclosure-builder/redact-core.mjs";
// json-keypath-v1-native.mjs is pulled in transitively by redact-core.

const out = await buildRedactDisclosure({
  originalFileBytes,            // the raw JSON object bytes
  canonicalJson, proofsJson, anchorRef,
  selectedLeafIndices: [1, 2],  // sorted-key indices to reveal
  renderMode: "drop",           // or "mask"; default is "drop"
});
// out.disclosureBlock.presentation.format === "json" -> write out.json

renderMode is presentation-only: both renderings carry the identical proof (the merkle binding is over the revealed keys' leaves; the redacted copy's bytes feed only presentation.view_sha256, never an on-chain hash). renderMode (drop | mask) applies to every multi-mode profile — json-keypath-v1, json-ast-v1, and text-tree-v1 (§6b) — and is silently ignored for the single-mode line-oriented profiles (csv-row-v1, csv-column-v1, text-line-v1).

6b. Deep text — text-tree-v1 (one leaf per node)

text-tree-v1 is the multi-tier text profile: where text-line-v1 commits one leaf per whole line, text-tree-v1 commits one leaf for every node of a frozen decomposition — the whole file (path ""), every paragraph (/pN), every sentence (/pN/sM), and every token (/pN/sM/tK). So one anchor can disclose a single token, a whole sentence, a paragraph, or the file. It is sealed-only (the source .mbnt always carries a master salt — see §7), because token-granularity leaves have near-zero entropy. Anchor one headless first with Headless anchor; this section redacts the result.

Because the reveal targets are nodes at four tiers, not row numbers, you choose which nodes to reveal from a listing rather than guessing. The CLI's satsignal-redact --list prints, per node, the reveal index, the copy-pastable selector (the slash path), the leaf_id, and a truncated value:

satsignal-redact --list mydoc.txt mydoc.mbnt

For the frozen example document Hi there. Don't go.\n\nBye-bye! (two paragraphs; Don't uses U+0027) it prints the 14 nodes in sorted-path / leaf-index order:

idx  path          node content (span)
  0  ""            "Hi there. Don't go.\n\nBye-bye!"   (whole file)
  1  /p0           "Hi there. Don't go.\n\n"           (paragraph)
  2  /p0/s0        "Hi there. "                        (sentence)
  3  /p0/s0/t0     "Hi"                                (token)
  4  /p0/s0/t1     "there"                             (token)
  5  /p0/s0/t2     "."                                 (token)
  6  /p0/s1        "Don't go.\n\n"                     (sentence)
  7  /p0/s1/t0     "Don't"                             (token)
  8  /p0/s1/t1     "go"                                (token)
  9  /p0/s1/t2     "."                                 (token)
 10  /p1           "Bye-bye!"                          (paragraph)
 11  /p1/s0        "Bye-bye!"                          (sentence)
 12  /p1/s0/t0     "Bye-bye"                           (token)
 13  /p1/s0/t1     "!"                                 (token)

Read the tier off the path depth: "" = whole file, /pN = paragraph, /pN/sM = sentence, /pN/sM/tK = token. (Note the deliberately dumb sentence rule split s0 at Hi there. because the . was followed by a space — see the profile spec §3.3.)

Now reveal a whole sentence (/p0/s1, index 6) plus a single token (/p1/s0/t0 = Bye-bye, index 12), withholding everything else. Reveal by name (the selector column) — recommended, and index-stable:

satsignal-redact mydoc.txt mydoc.mbnt --reveal-paths /p0/s1,/p1/s0/t0

or, equivalently, by numeric index:

satsignal-redact mydoc.txt mydoc.mbnt --reveal 6,12

For a JSON (json-ast-v1) source the names are RFC-6901 pointers, so use --reveal-pointers (an alias — both resolve against the same selector column): e.g. --reveal-pointers /from,/to,/classification reveals those whole subtrees and withholds the rest. Names match exactly; an unknown name is a hard error that lists the valid selectors. (Both flags accept comma-separated values and may be repeated.)

It writes mydoc.redacted.txt + mydoc.disclosure.mbnt, then self-verifies (same as §5/§8). The redacted .txt (presentation.format == "txt") renders each revealed node's span verbatim in document order, and collapses each maximal run of withheld nodes to one marker:

--modemarkerredacted copy of the §6b reveal
drop (default)[…][…]Don't go.\n\nBye-bye[…]
mask[REDACTED][REDACTED]Don't go.\n\nBye-bye[REDACTED]

The withheld first sentence collapses to the leading marker; the withheld ! token to the trailing one. The revealed sentence Don't go.\n\n and the revealed token Bye-bye are adjacent in the canonical string, so there is no marker between them. Both modes carry the identical proof — the marker choice feeds only presentation.view_sha256, never the on-chain root.

Because the source is sealed, the tool reads the 32-byte master salt out of the source .mbnt's manifest.json automatically — you do not pass it on the CLI. It derives the per-leaf salts, and the disclosure carries only the revealed nodes' per-leaf salts; the master salt is never shipped in mydoc.disclosure.mbnt (the master-salt-strip rule, §7). Reveal-index arithmetic, the entry preimage, and the per-leaf HKDF salt are all pinned in the profile spec.

7. Standard vs sealed — the master-salt rule

Standard source (chunk_merkle.algo == "sha256"): pass no salt. Revealed units carry their plain canonical value. Note the documented tradeoff: standard leaves are bare sha256, so a withheld unit of low entropy can be brute-forced from the published leaf hash by anyone who can guess its candidate set. If that matters, the source should have been anchored sealed.

Sealed source (chunk_merkle.algo == "merkle-hmac-sha256"): you must pass the 32-byte master salt, read from the source manifest.json's salt_b64 (base64url):

// base64url -> 32 raw bytes
const b64 = manifest.salt_b64.replace(/-/g, "+").replace(/_/g, "/");
const masterSaltBytes = new Uint8Array(
  Buffer.from(b64 + "=".repeat((4 - (b64.length % 4)) % 4), "base64")
);
const out = await buildRedactDisclosure({
  originalFileBytes, canonicalJson, proofsJson, anchorRef,
  selectedLeafIndices: [0, 2],
  masterSaltBytes,            // REQUIRED for sealed; omit for standard
});

The core uses the master salt only to derive the per-leaf salts, then strips it from every output. A revealed unit carries its own per-leaf salt_b64; the master salt and any withheld unit's per-leaf salt never appear in the disclosure block, the redacted copy, or anywhere the core returns — this is the master-salt-strip rule, enforced by a hard guard that aborts the build if the salt would leak in any encoding. Never ship the master salt. Pass masterSaltBytes to a sealed source or the build hard-fails; pass it to a standard source and it also hard-fails (the algos must match the carrier).

8. Optional self-verify

verifyDisclosureCore is the same core the public /verify page runs — verify your own output before you hand it over:

const result = await verifyDisclosureCore(out.disclosureBlock, {
  carrierBytes,        // the verbatim source canonical.json bytes
  onChainCommit,       // sha256(carrierBytes); compared at the commit's hex width
  viewBytes: out.redactedCopyBytes,  // optional presentation view-hash check
});
// { ok: true, fail_code: null }  on success

onChainCommit is compared against the leading hex chars of sha256(carrierBytes), so both the live 40-hex on-chain commitment and the full 64-hex hash work. A non-ok result returns a fail_code you can branch on.

9. What this does NOT do

10. Where this fits

11. Packaged form — the satsignal-redact SDK / CLI

Status: not yet published. satsignal-disclosure-redact / satsignal-redact are not on npm yetnpm install will 404. Until they ship, use the vendored-modules path in §2 + §5 (the no-package recipe), which is the current way to run this headless today. The SDK/CLI shapes shown below are stable and forward-compatible.

Everything in §4–§8 is also wrapped in a small reference package, so you don't have to hand-write the read-unzip-build-write glue. It is the same pure modules this guide vendors — no new cryptography, no upload, no key — exposed as one Node call and one command. There is still no hosted endpoint: the package runs entirely on your machine.

JS API. One async call does the whole §5 flow — read the original

import { redactFromMbnt } from "satsignal-disclosure-redact";

const out = await redactFromMbnt("employees.csv", "employees.mbnt", {
  reveal: [0, 2],        // 0-based unit indices to REVEAL …
  // revealNames: ["/from", "/to"], // … OR by name (paths/pointers); pick one
  // renderMode: "drop", // multi-mode profiles: "drop" (default) | "mask"
  // outDir: "out/",     // defaults next to the original file
});
// out.redactedCopyPath, out.disclosureMbntPath, out.rootHex,
// out.verify === { ok: true, fail_code: null }

It throws RedactBindingError on a non-matching pair, a wrong sealed master salt, a bad selection, or an unknown revealNames entry — the same hard-fail as §4. (resolveRevealNames(originalFileBytes, members, names) is also exported if you want to map names → indices yourself.)

CLI.

satsignal-redact <original-file> <source.mbnt> --reveal 0,2 \
    [--mode drop|mask] [-o out/]
# or select by NAME (the --list `selector` column), index-stable:
satsignal-redact memo.txt    memo.txt.source.mbnt    --reveal-paths /p0,/p4
satsignal-redact record.json record.json.source.mbnt --reveal-pointers /from,/to

writes <base>.redacted.<ext> + <base>.disclosure.mbnt, then prints the bound root and the self-verify result. Pass exactly one of --reveal (indices) or --reveal-paths / --reveal-pointers / --reveal-names (the aliased by-name selectors — comma-separated, repeatable, matched exactly against --list). --mode (drop | mask) applies to the multi-mode profiles — json-keypath-v1, json-ast-v1, and text-tree-v1 — and is ignored for the single-mode line-oriented profiles (csv-row-v1, csv-column-v1, text-line-v1). The package has zero third-party dependencies and needs only Node 18+ — no npm install for crypto, and no system unzip (it reads the .mbnt with Node's built-in zlib).

Availability. The package re-exports the public modules in §2 (it does not fork them), so its behavior is identical to the hand-written recipe above. A standalone published release may follow; until then, vendoring the modules (§2 + §5) is the way to run this headless today, and the API and CLI shapes shown here are stable.

Questions about this specification? Email hello@satsignal.cloud.