csv-row-v1 — selective-disclosure profile for CSV row leaves (native anchor rule)
Authority. This profile documents the native csv-row-v1 leaf rule that every standard CSV anchor already commits. It does not invent a rule — it writes down, to the byte, the rule the anchor path has always produced. Selective disclosure binds to this native rule (rather than to a disclosure-specific salted scheme), and the profile literal is a forever-contract.
Reframe note (read this first). An earlier draft of this profile (the
satsignal.csv.row.v1dotted literal) defined a salted leaf rule with disclosure-specific random per-row salts and aprofile||0x00||leaf_id||0x00||value||0x00||saltpreimage. That dotted scheme is now deprecated / inert. No production flow emits or consumes it; its allowlist literal and its frozen Stage 1–3 corpus are retained forever as a regression guard, never used. This spec now documents the hyphenatedcsv-row-v1literal that anchors actually emit — an unsalted, header-excluded, duplicate-last rule. See §9 for the deprecation pointer.
Versioning. The profile literal is the hyphenated "csv-row-v1" — the exact string a standard CSV anchor stamps into subject.proofs.chunk_merkle.scheme. The shape evolves additively as v1.x: new fixture coverage and clarifying prose MAY be added; every existing anchor recomputes to identical leaf hashes; and the segmentation / canonicalization / leaf-hash / merkle rules below are fixed forever for this literal — because they are the rule every on-chain csv-row-v1 anchor already committed. A bug in any of these rules can never be patched in place; the only remedy is a new sibling literal that compatible verifiers support in parallel. This profile covers both modes that share this literal: the standard mode (algo: "sha256", unsalted; §§2–8) and the sealed mode (algo: "merkle-hmac-sha256", per-leaf HKDF salts; §5b). The two modes share the §2 canonicalization, §3 header-exclusion, and §6 duplicate-last merkle byte-for-byte — they differ only in the per-leaf hash. The mode a verifier applies is selected by the carrier chunk_merkle.algo, never by the literal alone.
Status: native rework, 2026-05-29. Audience: anchorers who anchor a CSV at time T1 (standard mode) and later produce a validated redacted copy revealing specific rows under disclosure-v1.md; verifier authors who must recompute a row leaf from (value) alone and walk it into the merkle root the original anchor committed. Goal: pin one canonical byte-level rule for "given a CSV file, what is data-row leaf N, and what bytes does its leaf hash cover?" — to the byte, matching the anchor code, with adversarial fixtures, forever.
1. Why this exists
A CSV file is the simplest plaintext-shaped artifact with a natural per-row leaf-set: invoices (one row = one line item), event logs (one row = one event), ledgers (one row = one transaction), appointment manifests, allow-lists. A standard CSV anchor at time T1 already commits a csv-row-v1 chunk_merkle root over per-data-row leaves. Selective row disclosure at any later time T2 reveals a subset of those committed leaves and proves them into that existing root — no re-anchoring, no re-disclosing the rest, and (for standard mode) no salt keyfile: producing a validated redacted copy needs only the original file + its .mbnt.
This profile defines that segmentation. Its forever-contract scope is narrow on purpose:
- One delimiter. Comma
0x2Conly. Tabs / semicolons / pipes are not this profile; a future driver would carry its own forever-pin. - One leaf per data row. The header row (row 0) is excluded; every remaining row is one leaf, in document order. See §3 — this is the load-bearing flip from the retired dotted profile, which hashed every row including row 0.
- One hash construction per mode. Standard mode (§4) is the bare
sha256(utf8(canonical_data_row)). Sealed mode (§5b) shares the literal withalgo: "merkle-hmac-sha256"and replaces the bare hash with an HMAC under a per-leaf HKDF salt.
The narrow scope is deliberate: a profile this small can be canonicalized to the byte and exhaustively fixture-tested.
2. Inputs and canonicalization
The anchorer feeds raw bytes — the source CSV file as it exists on disk. Before any leaf extraction, the verifier (and the redact-from-original tool) MUST apply the canonicalization rules below in the order given. These rules match parseCsv / csvField / csvRow in web/templates.py byte-for-byte.
Decision (forever): Encoding is UTF-8. Source bytes are decoded as UTF-8 before parsing. The canonical row strings are re-encoded to UTF-8 for hashing.
Decision (forever): Strip ONE leading BOM. If the first decoded code point is U+FEFF, it is removed before parsing — and only the single leading one. A BOM-emitting exporter (Excel, etc.) and a non-BOM-emitting one produce the same leaf-set for the same logical content. (Mechanically: parseCsv checks text.charCodeAt(0) === 0xFEFF and slices it off.)
Decision (forever): RFC-4180 quote-aware parse. A field opens a quoted region on a "; inside a quoted region a "" pair is a literal " and a lone " closes the region; , outside quotes ends a field; an unquoted LF / CR / CRLF ends a row. A ,, LF, or CR inside a quoted field is content, not a separator.
Decision (forever): Row break on unquoted LF / CR / CRLF. All three terminators delimit rows; a CRLF is consumed as a single break (the parser advances past the \n after a \r). Line endings are thus normalized implicitly by re-emission (see below) — a CRLF file and the equivalent LF file produce identical canonical rows and identical leaves.
Decision (forever): No trailing-newline empty row. A trailing terminator does not emit an empty final row. Concretely (matching the anchor): after the parse loop, a final row is appended only if the last field or row buffer is non-empty (if (field.length || row.length)). A file ending …,Writer\n and a file ending …,Writer produce the same rows.
Decision (forever): Minimal re-quote per field (csvField). Each parsed field is re-emitted as follows: if the field contains any of " , LF (0x0A) CR (0x0D), it is wrapped in " and every internal " is doubled to ""; otherwise it is emitted bare. This is the exact predicate /[",\n\r]/ from csvField.
- Consequence (differs from the retired dotted profile): because re-quoting is minimal, a field that was quoted in the source only to wrap an empty string ("") parses to the empty string and re-emits bare. So a source data row "",x,y canonicalizes to ,x,y. The retired salted profile claimed "" stayed distinct from a bare empty field; under the native rule the parse+minimal-re-quote normalizes them to the same canonical bytes. (Quotes that are required — e.g. around a field containing a comma — are of course preserved, because the field still contains a ,.)
Decision (forever): Canonical row = fields joined by ,; canonical doc = canonical rows joined by \n (LF), no trailing newline. The canonical document (rows joined by LF) is what the csv-norm-v1 content hash covers; the per-row leaves are taken from the canonical rows individually (see §3). No whitespace is trimmed; empty fields are legal (,, is a row of two empty cells).
3. Leaf extraction — HEADER EXCLUDED
THE FLIP — read carefully. This is the single most important behavioral difference from the retired dotted
satsignal.csv.row.v1profile, which hashed every row including row 0. Under the nativecsv-row-v1rule, row 0 is ALWAYS dropped before the leaf-set is built — regardless of whether it is semantically a header. A verifier or redaction tool that includes row 0 as a leaf will compute a different leaf-set and a different root and will not bind to any real anchor.
Decision (forever): The header row (row 0) is EXCLUDED from the leaf-set. After canonicalization (§2), the leaf-set is canonicalLines.slice(1) — the data rows, dropping the first canonical row. (Matching the anchor: const dataRows = canonicalLines.slice(1).)
- Leaf index ↔ file row. Leaf index
i(zero-based) is data rowi, which is original file rowi + 1. Leafr000000is the second row of the file (the first data row); the first file row is the excluded header. - The anchor's
header_included: truemetadata is misleading and MUST be ignored. Standard-anchorproofs.jsoncarries ametadataobject withheader_included: trueanddata_row_count. The leaf-set is always header-excluded regardless of that flag; do not index off it. (It is anchor wire metadata, not part of the leaf rule, and is not touched by this profile.)
Decision (forever): Leaf ordering is data-row document order, zero-indexed. Leaf 0 is the first data row, leaf 1 the second, and so on, in the order they appear in the canonicalized file. The merkle leaf-set order is this order, unchanged. The verifier does NOT re-sort. Document order is the only ordering a verifier can derive from the raw bytes without consulting anchorer intent.
Decision (forever): leaf_id is r<N> with N = the DATA-ROW index zero-padded to six decimal digits. Examples: r000000 (first data row), r000001, r042195. The format is the ASCII literal r followed by exactly six ASCII decimal digits of the data-row index (NOT the file-row index). leaf_id is a display / ordering handle only — it is NOT part of any hash preimage (see §4). Six digits support up to 1,000,000 data rows.
Decision (forever): Empty input is invalid; a header-only file has zero data leaves and is invalid input. leaf_count is the data-row count = canonicalLines.length - 1. An empty file (zero bytes, or zero bytes after BOM strip) has no rows and is invalid. A file with exactly one row (a header and no data rows) has zero data leaves: the anchor emits no chunk_merkle for it, so it cannot be a disclosure source and is invalid input under this profile. A valid csv-row-v1 disclosure source has leaf_count ≥ 1 (i.e. ≥ 2 file rows: one header + ≥ 1 data row).
4. Leaf hash — bare sha256 of the canonical data row (standard mode)
This section defines the exact bytes that go into SHA-256 to produce a data-row leaf's leaf_hash in standard mode (algo: "sha256").
Decision (forever): The leaf hash is the BARE sha256 of the canonical data-row string's UTF-8 bytes:
leaf_hash = SHA-256( utf8( canonical_data_row ) )
There is no profile literal in the preimage, no leaf_id, no salt, and no 0x00 separators. The preimage is exactly the canonical row's UTF-8 bytes — nothing else. (Matching the anchor: sha256Hex(enc.encode(L)) where L is the canonical data-row string.)
This replaces the retired dotted profile's salted preimage (
profile_literal || 0x00 || leaf_id || 0x00 || value || 0x00 || salt_raw) in its entirety. Standardcsv-row-v1carries nosalt_b64on its revealed entries (see §5).
The value a disclosure carries for a revealed standard leaf is the canonical re-quoted row string (§2): the exact bytes the leaf hash covers, including any quote characters the minimal re-quote rule preserved. The verifier hashes utf8(value) and compares to leaf_hash; it does not re-canonicalize.
Worked example (NOT placeholders — computed against the anchor rule)
Input CSV bytes (string view, \n = LF byte 0x0A):
name,age,role\nAlice,42,Engineer\nBob,35,Designer\nCarol,29,Writer\n
Canonicalization (§2) yields four canonical rows; row 0 is the header and is EXCLUDED (§3), leaving three data-row leaves:
leaf_id | file row | value (canonical row) | value UTF-8 (hex) | leaf_hash = sha256(utf8(value)) |
|---|---|---|---|---|
| — | 0 (header) | name,age,role (EXCLUDED) | — | not a leaf (528a70… if hashed, but it is never hashed) |
r000000 | 1 | Alice,42,Engineer | 416c6963652c34322c456e67696e656572 (17 B) | 3147617d8c181d8e8a1748b8c9642bf9dd2c33d0b2b13da2dddf897e6139800a |
r000001 | 2 | Bob,35,Designer | 426f622c33352c44657369676e6572 (15 B) | 701287f253f32674ccef5ea56003421c7fe8fb87eedf58042ce133473e1b9731 |
r000002 | 3 | Carol,29,Writer | 4361726f6c2c32392c577269746572 (15 B) | f5edf8ce0f5e68dffbd0274e9af59f001102dfd84cecab4caf81e1e7296c988d |
leaf_count = 3 (data rows; header excluded). A verifier MUST reproduce the three leaf_hash values exactly from the listed value bytes. If it does not, a step in the canonicalization or the bare-sha256 leaf hash is wrong; debug against the fixtures in §8 before doing anything else.
5. Salts — standard mode is UNSALTED (privacy posture is first-class)
Decision (forever): Standard csv-row-v1 is UNSALTED. There is no salt in the leaf hash (§4) and salt_b64 is ABSENT from standard revealed-leaf entries (revealed[i] carries {leaf_id, profile, value, leaf_hash, proof_path} — no salt_b64). Do not synthesize an empty or zero salt; the field is simply not present.
Privacy posture (must be understood, not "fixed"). A standard anchor's redacted rows are protected only against a party who cannot enumerate the unknown row content. Understate this at your peril; the honest characterization is stronger than "an incidental proof_path sibling leaks":
- The standard
.mbntpublishes EVERY leaf hash, including redacted rows.proofs.jsoncarriesmerkle_leaves= the complete ordered list of every data-row leaf hash, redacted rows included. A holder of the standard bundle therefore has the exactsha256(canonical_row)of each withheld row and can guess-and-confirm it entirely offline — it does not depend on a withheld row happening to sit on a revealed leaf'sproof_path. - Zero per-leaf entropy ⇒ identical withheld rows have identical leaf hashes. Because the leaf is
sha256(canonical_row)with no salt, noleaf_id, and no profile tag (§4), two redacted rows with the same canonical content produce the same leaf hash. An observer of the bundle can therefore tell which redacted rows share a value (cross-equality leak) even without recovering the value. - A withheld row's recovery cost equals THAT ROW'S OWN entropy. Rows whose canonical content is drawn from a small or structured space — booleans, enums, status flags, small integers, dates, names from a known list, or known-format identifiers (phone numbers, SSNs, currency amounts) — are trivially recoverable: an adversary confirms a guess by hashing it once and comparing to the published leaf hash. Only genuinely high-entropy free-form row content is protected.
This is the documented, anchor-time-chosen cost of standard mode — it is not a defect to "fix" by adding salts. Standard mode gives the disclosed-rows-only membership guarantee while leaving every redacted row recoverable at the cost of its own entropy. Do NOT use standard mode to withhold low-entropy or small-space sensitive rows (anything an adversary could enumerate): route that data to sealed mode (per-leaf HKDF salts, documented in §5b), where redacted rows are unguessable because the verifier cannot derive the per-leaf salt without the anchor's master salt and equal withheld rows do not collide. The user chooses this tradeoff when they anchor (standard vs sealed), not at disclosure time — choose sealed before anchoring if any withheld row could be low-entropy.
What audits MUST NOT re-litigate: do NOT re-propose disclosure-specific salts or a salt keyfile for the standard path, and do NOT treat the standard redacted-row brute-forceability as a defect to patch with salts. Sealed is the answer. No keyless scheme can protect a redacted row that is itself the guessable secret; that is the cost of the no-keyfile requirement and the reason sealed exists.
5b. Sealed mode — HMAC leaf under a per-leaf HKDF salt (algo: "merkle-hmac-sha256")
Sealed mode is the privacy path (§5). A sealed CSV anchor commits the same leaf-set segmentation as standard — same §2 canonicalization, same §3 header-exclusion (slice(1), leaf i = file row i + 1), same §6 duplicate-last merkle — but replaces the bare sha256 leaf (§4) with an HMAC under a per-leaf salt derived from the anchor's master salt by HKDF. The user chooses sealed vs standard at anchor time (§5); a disclosure binds to whichever the anchor committed and never re-derives the mode. The sealed rule below mirrors the anchor (web/templates.py deriveLeafSalt / hmacSha256 / merkleRootFromLeafBytes, sealed CSV leaf loop) and SPEC_v2_sealed.md §3.3 byte-for-byte; it is the authoritative selective-disclosure statement of the same construction.
Decision (forever): The per-leaf salt is HKDF-SHA256 of the master salt, with a fixed namespace and a per-leaf big-endian counter. For data-row leaf index i (zero-based; i = file row i + 1, header already excluded):
salt_i = HKDF-SHA256(
ikm = master_salt, # the 32-byte bearer secret
salt = utf8("satsignal-sealed-v1/per-leaf"),
info = utf8("chunk/") || u32_be(i), # "chunk/" then 4-byte BIG-ENDIAN i
L = 32 # output length, bytes
)
master_saltis the 32-byte high-entropy salt the anchor generated at anchor time; in a sealed.mbntit lives inmanifest.json'ssalt_b64(base64url) and is flaggedbearer_secret: true. It is the HKDF IKM, never a leaf value.- The HKDF
saltparameter is the exact ASCII bytessatsignal-sealed-v1/per-leaf(28 bytes; note the namespace issealed-v1, the info prefix ischunk/— this is the anchor's actualchunk_merklederivation; do not confuse it with the unrelatedmerkle-row-sealed-v1table scheme's…merkle-row-sealed-v1/per-leafrow/namespace inSPEC_merkle_row.md§3.2, which is a different, JSON-row, client-side scheme — see §9).
- The HKDF
infois the literal ASCIIchunk/(6 bytes) followed by the 4-byte big-endian encoding of the data-row indexi(u32_be(0) = 00 00 00 00,u32_be(1) = 00 00 00 01, …). L = 32: each per-leaf salt is exactly 32 bytes.
Decision (forever): The sealed leaf hash is the HMAC-SHA256 of the canonical data-row's UTF-8 bytes, keyed by that row's per-leaf salt:
leaf_hash_i = HMAC-SHA256( key = salt_i, msg = utf8( canonical_data_row_i ) )
The msg is exactly the §2 canonical re-quoted row string's UTF-8 bytes — identical to the standard-mode preimage (§4); only the construction changes from a bare sha256 to a keyed HMAC under salt_i. There is no profile literal, no leaf_id, and no 0x00 separator in the HMAC message.
Decision (forever): The merkle is the §6 duplicate-last tree over the raw 32-byte HMAC commitments. The leaves are the raw 32-byte leaf_hash_i values (not hex, not re-hashed); parents are SHA-256(raw(left) || raw(right)) with the same duplicate-last self-pair on an odd node as standard mode (§6). The inner-node hashing is plain SHA-256 — the per-leaf HMAC is the only salting point, exactly as SPEC_v2_sealed.md §3.3 states. A single-leaf tree's root is that leaf itself.
Decision (forever): The carrier pins algo: "merkle-hmac-sha256" and salt_version: "salt_v1". A sealed CSV anchor stamps subject.proofs.chunk_merkle = {scheme: "csv-row-v1", algo: "merkle-hmac-sha256", salt_version: "salt_v1", leaf_count, root}. The (subject_profile, chunk_merkle.algo) pair ("csv-row-v1", "merkle-hmac-sha256") selects this sealed rule; ("csv-row-v1", "sha256") selects the standard rule (§4). A verifier MUST branch on the carrier algo; the literal alone does not disambiguate.
5b.1 What a sealed disclosure carries — per-leaf salt, NEVER the master
Decision (forever): A revealed sealed leaf carries salt_b64 = the PER-LEAF salt salt_i, base64-encoded — NEVER the master salt. A sealed revealed[i] entry is {leaf_id, profile: "csv-row-v1", value: <canonical re-quoted row string>, salt_b64: base64(salt_i), leaf_hash: HMAC(salt_i, utf8(value)), proof_path}. The verifier recomputes the leaf as HMAC-SHA256(base64decode(salt_b64), utf8(value)) and compares it to leaf_hash, then walks proof_path to the committed root (disclosure-v1.md §7 step 4). For sealed leaves salt_b64 is REQUIRED; a sealed carrier with salt_b64 missing fails the disclosure closed.
Decision (forever) — THE MASTER-SALT-STRIP RULE (security requirement, forever). A csv-row-v1 sealed disclosure .mbnt MUST NOT contain the source anchor's manifest.json, the 32-byte master salt, or any field carrying it (salt_b64 at the manifest/bearer level, bearer_secret: true, etc.). The disclosure carries only the per-leaf salts of the revealed rows, inside each revealed[i] entry. The redact-from-original tool reads the master salt from the source .mbnt manifest.json, derives the per-leaf salts of the revealed rows only, emits them in revealed[], and strips the master salt from all output. This is forever-load-bearing: shipping the master salt in a disclosure lets anyone re-derive every per-leaf salt via HKDF and brute-force (or directly recompute the HMAC of) every redacted row — it unseals the entire table. A disclosure that ships the master salt has defeated the whole point of sealed mode.
Why revealing per-leaf salts of revealed rows is safe. HKDF-Expand is a PRF: revealing salt_i (the output for info = "chunk/" || u32_be(i)) leaks neither the IKM (master_salt) nor any other output salt_j (j ≠ i). So a sealed disclosure that publishes salt_i for each revealed row lets a verifier recompute exactly those rows' HMAC leaves and prove their membership, while every redacted row's commitment stays opaque — its per-leaf salt is underivable without the master salt, and HMAC under an unknown 32-byte key is not brute-forceable even for a low-entropy row value. The formal argument is SPEC_v2_sealed.md §5.3.
5b.2 Privacy posture
Sealed mode is the answer to standard mode's brute-forceability (§5). A standard anchor's redacted rows are confirmable in one sha256 by a party who can guess a low-entropy row. A sealed anchor's redacted rows are unguessable: the leaf is HMAC(salt_i, row) under a per-leaf salt the verifier cannot derive without the master salt, so no candidate row can be tested against an undisclosed commitment. This is the privacy path the user opts into at anchor time (§5). It is achieved with no extra keyfile — the per-leaf salts are HKDF-derived from the master salt the anchor already persisted in the source .mbnt. What audits MUST NOT re-litigate: sealed is the privacy answer; standard's brute-forceability is the documented, anchor-time-chosen cost of the no-keyfile standard path, not a defect to patch with salts.
5b.3 Worked example (NOT placeholders — computed against the anchor rule)
These values are computed by HKDF-SHA256 + HMAC-SHA256 + the duplicate-last merkle over the exact same primitives the browser anchor uses (deriveLeafSalt / hmacSha256 / merkleRootFromLeafBytes). They are NOT placeholders. A frozen sealed corpus lands at tests/vectors/disclosure-v1/csv_row_v1_sealed/ ; these inline vectors keep this section self-contained.
Fixed test master salt = the 32 bytes 0x00 0x01 … 0x1f (000102…1e1f). In a sealed source .mbnt, this is manifest.json's salt_b64 (base64url, the bearer secret, stripped from any disclosure):
master_salt (hex) = 000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f
manifest.json salt_b64 (base64url, NEVER shipped in a disclosure)
= AAECAwQFBgcICQoLDA0ODxAREhMUFRYXGBkaGxwdHh8
Input CSV (same as §4's example; header name,age,role is row 0 and is EXCLUDED, leaving three data-row leaves):
name,age,role\nAlice,42,Engineer\nBob,35,Designer\nCarol,29,Writer\n
Per-leaf salts (salt_i = HKDF(master_salt, "satsignal-sealed-v1/per-leaf", "chunk/"||u32_be(i), 32)) and leaves (leaf_hash_i = HMAC(salt_i, utf8(value))):
leaf_id | file row | value (canonical row) | salt_b64 = base64(salt_i) — the PER-LEAF salt (std base64) | leaf_hash = HMAC(salt_i, utf8(value)) |
|---|---|---|---|---|
r000000 | 1 | Alice,42,Engineer | qMoAQfGknOScBChILtGnu9aA1VXa16fyY79Nvu/NOUA= | b4d1776516e344977142e8605cc5c23cb28b3590cf6f6ff38078acb774b851b9 |
r000001 | 2 | Bob,35,Designer | nE9TkuI5Ift2NOmTOuzvcRkUcnxu4iWJyOUW//86buE= | d725af39959bef81bf9ece86a6509622cfe3581a27a353a0fd6098a6150bbb07 |
r000002 | 3 | Carol,29,Writer | dmVYjcVZH2NJO8hMQMr4GcbCXyVA168ORyLoUiP7Occ= | 09a03eac822c3c2c0dbd685722d2b5654b6572c53e05f22acb41d87fc3d9275d |
leaf_count = 3. The salt_b64 values above are the per-leaf HKDF outputs (std base64, 32 bytes each); each appears in its row's revealed[] entry. None of them is the master salt — a disclosure revealing all three still leaks nothing about the master or about any redacted row in a larger table (HKDF-Expand is a PRF). Note also that the leaf_hash values differ entirely from §4's bare-sha256 leaves for the same rows: a sealed carrier and a standard carrier of the same CSV commit to different roots — they are distinct anchors, distinguished by algo.
Merkle (duplicate-last over the raw 32-byte HMAC commitments; L0[i] = leaf_hash_i):
L1[0] = SHA-256( raw(L0[0]) || raw(L0[1]) )
= cdfce3bff059980a6fabfb54a8e84091cc9f72f5a8d6251f3726decaa38eb45b
L1[1] = SHA-256( raw(L0[2]) || raw(L0[2]) ) ← DUPLICATE-LAST (Carol self-pairs)
= b9a45f0dca4ec4eabaf6ca7a6835b9511da3f6bc62247fa08fc1581ab860676b
ROOT = SHA-256( raw(L1[0]) || raw(L1[1]) )
= 2207e09f1cafe3cb7099d905d47eef8c998d42a0b2413b3a0a0413110f47f6a3
Proof paths a sealed disclosure carries to reveal each leaf (all walk to ROOT — identical tree shape to §6's standard example, including Carol's two-entry self-sibling path; only the underlying hash values differ because the leaves are HMACs):
reveal r000000 (Alice):
salt_b64 = "qMoAQfGknOScBChILtGnu9aA1VXa16fyY79Nvu/NOUA="
leaf_hash = "b4d1776516e344977142e8605cc5c23cb28b3590cf6f6ff38078acb774b851b9"
proof_path = [
{ "side": "R", "hash": "d725af39959bef81bf9ece86a6509622cfe3581a27a353a0fd6098a6150bbb07" }, // L0[1] (Bob)
{ "side": "R", "hash": "b9a45f0dca4ec4eabaf6ca7a6835b9511da3f6bc62247fa08fc1581ab860676b" } // L1[1]
]
reveal r000001 (Bob):
salt_b64 = "nE9TkuI5Ift2NOmTOuzvcRkUcnxu4iWJyOUW//86buE="
leaf_hash = "d725af39959bef81bf9ece86a6509622cfe3581a27a353a0fd6098a6150bbb07"
proof_path = [
{ "side": "L", "hash": "b4d1776516e344977142e8605cc5c23cb28b3590cf6f6ff38078acb774b851b9" }, // L0[0] (Alice)
{ "side": "R", "hash": "b9a45f0dca4ec4eabaf6ca7a6835b9511da3f6bc62247fa08fc1581ab860676b" } // L1[1]
]
reveal r000002 (Carol): ← the odd last node, TWO-entry self-sibling path
salt_b64 = "dmVYjcVZH2NJO8hMQMr4GcbCXyVA168ORyLoUiP7Occ="
leaf_hash = "09a03eac822c3c2c0dbd685722d2b5654b6572c53e05f22acb41d87fc3d9275d"
proof_path = [
{ "side": "R", "hash": "09a03eac822c3c2c0dbd685722d2b5654b6572c53e05f22acb41d87fc3d9275d" }, // L0[2] ITSELF — self-sibling
{ "side": "L", "hash": "cdfce3bff059980a6fabfb54a8e84091cc9f72f5a8d6251f3726decaa38eb45b" } // L1[0]
]
A sealed verifier MUST reproduce each leaf_hash exactly from HMAC(base64decode(salt_b64), utf8(value)) and then walk the path to ROOT. If it does not, the HKDF derivation, the HMAC, or the duplicate-last merkle is wrong; debug against this vector before anything else.
6. Merkle behavior — DUPLICATE-LAST on odd
Decision (forever): The merkle is DUPLICATE-LAST on odd nodes, matching merkleRootFromHexLeaves. At each level, nodes are paired left-to-right; when a node is unpaired (the last node at an odd-count level), its right sibling is itself (right = (i+1 < len) ? level[i+1] : level[i]). The parent is SHA-256(raw(left) || raw(right)) — raw 32-byte concatenation, no domain tag. A single-leaf tree's root is that leaf itself (proof_path = []).
This replaces the retired dotted profile's promote-unchanged odd-node rule (where an unpaired node was lifted to the next level without re-hashing). Promote-unchanged and duplicate-last produce different roots for any odd-count level. The native anchor is duplicate-last; this profile is duplicate-last.
The disclosure verifier never rebuilds the root — it only walks proof_path (decode each 64-hex sibling and the frontier to raw bytes, concatenate sibling||frontier for side:"L" or frontier||sibling for side:"R", SHA-256, repeat; the final frontier MUST equal the committed root). The walk is structure-agnostic: it folds whatever siblings it is given and verifies both promote-unchanged and duplicate-last paths. Only the proof-path builder (the redact-from-original tool) encodes the duplicate-last shape — by emitting a self-sibling entry for an odd-promoted node.
Worked example (the §4 three-leaf tree)
Leaves (data rows; L0[i] = leaf i):
L0[0] = 3147617d…800a (Alice,42,Engineer)
L0[1] = 701287f2…9731 (Bob,35,Designer)
L0[2] = f5edf8ce…988d (Carol,29,Writer)
Level 1 (3 leaves → odd; the last node self-pairs under duplicate-last):
L1[0] = SHA-256( raw(L0[0]) || raw(L0[1]) )
= 4d6704c7c8fe0ad82fefbd7c7b530d8eb6087ff369568d6e763801ab9f07b5e6
L1[1] = SHA-256( raw(L0[2]) || raw(L0[2]) ) ← DUPLICATE-LAST (self-pair),
= e49b438fe484c909f9795172f8aea598123c422bcf7e11f257c53ecd5875609d
NOT promote-unchanged
ROOT = SHA-256( raw(L1[0]) || raw(L1[1]) )
= 19d82f92265bc904b4f356b1f69bb418e96bca56e57785d2d1ae7c1acc8d5e3e
Tree:
ROOT 19d82f…5e3e
|
--------------------------------------
| |
L1[0] 4d6704…b5e6 L1[1] e49b43…609d
= H( L0[0] || L0[1] ) = H( L0[2] || L0[2] ) (self-pair)
| |
---------------- (Carol pairs with herself)
| | |
L0[0] L0[1] L0[2]
Alice,42,… Bob,35,… Carol,29,…
Proof paths a disclosure carries to reveal each leaf (all walk to ROOT):
reveal r000000 (Alice):
proof_path = [
{ "side": "R", "hash": "701287f253f32674ccef5ea56003421c7fe8fb87eedf58042ce133473e1b9731" }, // L0[1] (Bob)
{ "side": "R", "hash": "e49b438fe484c909f9795172f8aea598123c422bcf7e11f257c53ecd5875609d" } // L1[1]
]
reveal r000001 (Bob):
proof_path = [
{ "side": "L", "hash": "3147617d8c181d8e8a1748b8c9642bf9dd2c33d0b2b13da2dddf897e6139800a" }, // L0[0] (Alice)
{ "side": "R", "hash": "e49b438fe484c909f9795172f8aea598123c422bcf7e11f257c53ecd5875609d" } // L1[1]
]
reveal r000002 (Carol): ← the odd last node, TWO-entry path
proof_path = [
{ "side": "R", "hash": "f5edf8ce0f5e68dffbd0274e9af59f001102dfd84cecab4caf81e1e7296c988d" }, // L0[2] ITSELF — self-sibling
{ "side": "L", "hash": "4d6704c7c8fe0ad82fefbd7c7b530d8eb6087ff369568d6e763801ab9f07b5e6" } // L1[0]
]
The Carol path is the duplicate-last signature. Carol is the odd-promoted node at level 0. Under duplicate-last her sibling at level 0 is her own leaf hash (
{side:"R", hash: L0[2]}) — foldingH(L0[2]||L0[2]) = L1[1]— and then she folds againstL1[0]at level 1. So her path has two entries. Under the retired promote-unchanged rule the same reveal had a one-entry path (Carol skipped level 0 and folded directly againstL1[0]). The proof-path walk verifies both; only the builder differs. This is the canonical illustration of the odd-node rule forcsv-row-v1.
7. Original anchor binding
A standard CSV anchor commits the csv-row-v1 leaf-set under the original .mbnt canonical document's subject.proofs.chunk_merkle field. The required pins (matching what buildCsvProofs stamps):
| canonical field | required value under this profile |
|---|---|
subject.proofs.chunk_merkle.scheme | exactly "csv-row-v1" (the hyphenated literal anchors emit) |
subject.proofs.chunk_merkle.algo | "sha256" (standard mode; sealed mode uses "merkle-hmac-sha256") |
subject.proofs.chunk_merkle.leaf_count | the data-row count (header excluded, §3) |
subject.proofs.chunk_merkle.root | duplicate-last merkle root over the data-row leaves (§6) |
A disclosure under disclosure-v1.md carries this same literal in disclosure.linked_anchor.subject_profile; each revealed leaf's profile field MUST equal "csv-row-v1". The verifier binds to the chunk_merkle the anchor already committed — there is no re-anchor and no new scheme. The binding chain (master spec §4) walks revealed[i].value → leaf_hash → linked_anchor.root → original canonical-doc chunk_merkle.root → on-chain document_hash.
Forbidden-variant note (flipped). A verifier MUST apply this native rule (bare-sha256 leaf, header-excluded, duplicate-last merkle) to an anchor whose chunk_merkle.scheme == "csv-row-v1" and algo == "sha256". It MUST NOT apply the retired dotted profile's salted preimage to a csv-row-v1 anchor. The dotted literal satsignal.csv.row.v1 is a different, deprecated scheme (§9); if a verifier sees that dotted scheme it MUST NOT apply this native rule to it, and vice versa. The distinguishing key is the (subject_profile, chunk_merkle.algo) pair: ("csv-row-v1", "sha256") is this standard rule; ("csv-row-v1", "merkle-hmac-sha256") is the sealed rule (§5b).
8. Fixtures (test vectors)
All leaf_hash / root values below were computed by SHA-256 over the bytes defined in §4/§6 (verified against the anchor code and the reference implementation). They are NOT placeholders. A verifier that does not reproduce them from the listed inputs has a bug.
A frozen native corpus lands at tests/vectors/disclosure-v1/csv_row_v1_native/: positive disclosures plus negatives (tampered value → leaf_hash_mismatch, wrong proof path → merkle_path_mismatch, header-included mistake). The following inline vectors keep this spec self-contained (a profile spec is not complete without vectors).
N1: minimal — header + 3 data rows, LF endings
Input (string view; \n = LF):
name,age,role\nAlice,42,Engineer\nBob,35,Designer\nCarol,29,Writer\n
Canonical rows: ["name,age,role", "Alice,42,Engineer", "Bob,35,Designer", "Carol,29,Writer"]. Header name,age,role is row 0 and is EXCLUDED. leaf_count = 3. Leaves, tree, and the three proof paths are the §4/§6 worked example:
leaf_id | value | leaf_hash |
|---|---|---|
r000000 | Alice,42,Engineer | 3147617d8c181d8e8a1748b8c9642bf9dd2c33d0b2b13da2dddf897e6139800a |
r000001 | Bob,35,Designer | 701287f253f32674ccef5ea56003421c7fe8fb87eedf58042ce133473e1b9731 |
r000002 | Carol,29,Writer | f5edf8ce0f5e68dffbd0274e9af59f001102dfd84cecab4caf81e1e7296c988d |
root = 19d82f92265bc904b4f356b1f69bb418e96bca56e57785d2d1ae7c1acc8d5e3e. Revealed entries carry no salt_b64.
N2: CRLF input — same content as N1 with \r\n between rows
name,age,role\r\nAlice,42,Engineer\r\nBob,35,Designer\r\nCarol,29,Writer\r\n
Canonical rows are byte-identical to N1 (CRLF row-breaks re-emit as LF-joined canonical rows; leaves are per-row strings, unaffected). leaf_count = 3; the three leaves and root are identical to N1. Pins that CRLF and LF sources produce the same leaf-set.
N3: BOM-prefixed input — U+FEFF then N1's content
[U+FEFF] name,age,role\nAlice,42,Engineer\nBob,35,Designer\nCarol,29,Writer\n
The single leading BOM is stripped before parsing; canonical rows, leaves, and root are identical to N1. Pins that a BOM-emitting exporter and a non-BOM one produce the same leaf-set.
N4: trailing newline absent — N1's content without the final \n
name,age,role\nAlice,42,Engineer\nBob,35,Designer\nCarol,29,Writer
No trailing terminator → no empty final row (§2). Canonical rows, leaves, and root are identical to N1. Pins that the leaf hash covers the row's content, not any terminator.
N5: even leaf count — header + 2 data rows (clean pairing)
h\nAlice,42,Engineer\nBob,35,Designer
leaf_count = 2; leaves r000000 (Alice,42,Engineer, 3147617d…800a) and r000001 (Bob,35,Designer, 701287f2…9731). With an even count there is no self-pair:
root = SHA-256( raw(L0[0]) || raw(L0[1]) )
= 4d6704c7c8fe0ad82fefbd7c7b530d8eb6087ff369568d6e763801ab9f07b5e6
(Note this equals L1[0] of N1, as expected.) Proof paths: r000000 → [{R, 701287…9731}]; r000001 → [{L, 3147617…800a}].
N6: quoted comma — data row "Smith, John",42,Engineer
Source (header col + one data row): col\n"Smith, John",42,Engineer. The comma inside the quoted field is content; the field still contains a , so the minimal re-quote preserves the quotes. leaf_count = 1 (single-leaf tree, proof_path = [], root == leaf):
leaf_id | value | leaf_hash |
|---|---|---|
r000000 | "Smith, John",42,Engineer | 233e061e7c3a3ddca8bc8812161444335337c4af179f556443e9e03dac05c34e |
Pins quote-aware parsing (no split on the embedded comma) and quote preservation for a required quote.
N7: escaped quote — data row "He said ""hi""",x,y
Source: col\n"He said ""hi""",x,y. The "" pairs are a literal " inside the quoted field; the field contains ", so minimal re-quote preserves the wrapping quotes and the doubled internal quotes verbatim. leaf_count = 1:
leaf_id | value | leaf_hash |
|---|---|---|
r000000 | "He said ""hi""",x,y | c0ccf8ba1b4cb1731873f2d907935949d8baa5682c4c9df1ed016e6ff8b869ba |
Pins that the canonical bytes carry the "" escape verbatim (the canonicalizer does NOT decode "" to " before hashing).
N8: embedded LF in a quoted field — data row "line1\nline2",x
Source: col\n"line1\nline2",x where \n (0x0A) sits inside the quoted field (not a row break). RFC-4180 quoting keeps it as content; the field contains \n, so minimal re-quote preserves the wrapping quotes. leaf_count = 1 (the embedded LF does NOT split the row):
leaf_id | value (escaped) | leaf_hash |
|---|---|---|
r000000 | "line1\nline2",x | d6af32b6bb9204df6131de7052a1631afcd99014ecb4e8a1825a41cdf60b4c0f |
(The \n in value is the literal LF byte between line1 and line2.) Pins that quote-aware row splitting is part of the contract: an LF inside a quoted field is content, not a terminator.
N9: empty cells — data row ,,foo,,
Source: h1,h2,h3,h4,h5\n,,foo,,. The data row has five fields (four empty + foo); none needs quoting, so all re-emit bare. leaf_count = 1:
leaf_id | value | leaf_hash |
|---|---|---|
r000000 | ,,foo,, | 52ed76e0ab0db728fcfb6631d642019cf766a28767bb2eb0612cff50d7953e9e |
Pins that empty cells are legal and the delimiters at empty-cell positions are part of the canonical bytes.
N10: quoted-empty normalizes to bare — data row "",x,y
Source: col\n"",x,y. The first field is an empty quoted string; it parses to the empty string, which contains none of " , LF CR, so minimal re-quote emits it bare. The canonical row is therefore ,x,y (NOT "",x,y). leaf_count = 1:
leaf_id | source field | canonical value | leaf_hash |
|---|---|---|---|
r000000 | "" (quoted) | ,x,y | a8c72b638eee690282c29d60ecd295e2454d8c938b14af94f84fabc651b999e6 |
Pins the minimal-re-quote consequence (§2): under the native rule a source "",x,y and a source ,x,y canonicalize to the same bytes and hash to the same leaf. (This is the documented behavioral difference from the retired dotted profile, which kept "" distinct.)
9. Out of scope / deprecation pointers
- Sealed mode — DOCUMENTED in §5b (this profile). The sealed leaf rule shares this profile literal
"csv-row-v1"withalgo: "merkle-hmac-sha256": same canonicalization, header-exclusion, and duplicate-last merkle as standard, but the leaf isHMAC-SHA256(key = per-leaf HKDF salt, msg = utf8(canonical_data_row))andrevealed[i]carries the per-leaf salt (never the master salt) insalt_b64. It is the privacy path (§5). See §5b for the byte-precise derivation, the master-salt-strip security requirement, and the worked example; the authoritative anchor-side construction isSPEC_v2_sealed.md §3.3and the disclosure verifier branch isdisclosure-v1.md §4.5 / §7. - NOT this profile: the
merkle-row-sealed-v1table scheme.SPEC_merkle_row.mddocuments a different, client-side, JSON-row commit/reveal scheme (merkle-row-v1/merkle-row-sealed-v1). Its sealed variant uses a different HKDF namespace (satsignal-merkle-row-sealed-v1/per-leaf+info = "row/"||u32_be(i)) over JCS-canonicalized JSON rows — it is not thechunk_merklerule a CSV anchor commits and is not a disclosure source for this profile. This profile's sealed mode (§5b) pins the anchor's actualchunk_merklederivation (satsignal-sealed-v1/per-leaf+chunk/), perSPEC_v2_sealed.md §3.3. Do not cross the two namespaces. - The retired dotted
satsignal.csv.row.v1scheme — DEPRECATED / INERT. The salted dotted literal (random per-row CSPRNG salts,profile||0x00||leaf_id||0x00||value||0x00||saltpreimage, every-row-is-a-leaf, promote-unchanged merkle) is retired. The literal"satsignal.csv.row.v1"is retained in the_VALID_MERKLE_SCHEMESallowlist forever (an allowlist literal is never removed) and its frozen disclosure corpus is retained as a regression guard — but no production flow emits or consumes it. Do not delete references the frozen salted corpus depends on; the dotted scheme is simply deprecated and unused. New CSV disclosures bind to the native hyphenatedcsv-row-v1rule documented here. - Non-comma delimiters / header-aware (column) schemas / semantic typing / partial-row (cell) disclosure / files > 1,000,000 data rows. All out of scope; each would be a separate forever-contract literal. Type assertions belong in the disclosure's
claimsblock, not in the leaf hash.
Every future profile gets its own literal. This profile's literal is the hyphenated csv-row-v1, fixed forever as the rule on-chain CSV anchors already commit, applied to anchors whose subject_profile == "csv-row-v1" under either carrier algo: chunk_merkle.algo == "sha256" selects the standard rule (§§2–8), chunk_merkle.algo == "merkle-hmac-sha256" (salt_version: "salt_v1") selects the sealed rule (§5b). A verifier distinguishes the two by the (subject_profile, chunk_merkle.algo) pair.
Questions about this specification? Email hello@satsignal.cloud.