csv-row-v1 — selective-disclosure profile for CSV row leaves (native anchor rule)

Authority. This profile documents the native csv-row-v1 leaf rule that every standard CSV anchor already commits. It does not invent a rule — it writes down, to the byte, the rule the anchor path has always produced. Selective disclosure binds to this native rule (rather than to a disclosure-specific salted scheme), and the profile literal is a forever-contract.

Reframe note (read this first). An earlier draft of this profile (the satsignal.csv.row.v1 dotted literal) defined a salted leaf rule with disclosure-specific random per-row salts and a profile||0x00||leaf_id||0x00||value||0x00||salt preimage. That dotted scheme is now deprecated / inert. No production flow emits or consumes it; its allowlist literal and its frozen Stage 1–3 corpus are retained forever as a regression guard, never used. This spec now documents the hyphenated csv-row-v1 literal that anchors actually emit — an unsalted, header-excluded, duplicate-last rule. See §9 for the deprecation pointer.

Versioning. The profile literal is the hyphenated "csv-row-v1" — the exact string a standard CSV anchor stamps into subject.proofs.chunk_merkle.scheme. The shape evolves additively as v1.x: new fixture coverage and clarifying prose MAY be added; every existing anchor recomputes to identical leaf hashes; and the segmentation / canonicalization / leaf-hash / merkle rules below are fixed forever for this literal — because they are the rule every on-chain csv-row-v1 anchor already committed. A bug in any of these rules can never be patched in place; the only remedy is a new sibling literal that compatible verifiers support in parallel. This profile covers both modes that share this literal: the standard mode (algo: "sha256", unsalted; §§2–8) and the sealed mode (algo: "merkle-hmac-sha256", per-leaf HKDF salts; §5b). The two modes share the §2 canonicalization, §3 header-exclusion, and §6 duplicate-last merkle byte-for-byte — they differ only in the per-leaf hash. The mode a verifier applies is selected by the carrier chunk_merkle.algo, never by the literal alone.

Status: native rework, 2026-05-29. Audience: anchorers who anchor a CSV at time T1 (standard mode) and later produce a validated redacted copy revealing specific rows under disclosure-v1.md; verifier authors who must recompute a row leaf from (value) alone and walk it into the merkle root the original anchor committed. Goal: pin one canonical byte-level rule for "given a CSV file, what is data-row leaf N, and what bytes does its leaf hash cover?" — to the byte, matching the anchor code, with adversarial fixtures, forever.

1. Why this exists

A CSV file is the simplest plaintext-shaped artifact with a natural per-row leaf-set: invoices (one row = one line item), event logs (one row = one event), ledgers (one row = one transaction), appointment manifests, allow-lists. A standard CSV anchor at time T1 already commits a csv-row-v1 chunk_merkle root over per-data-row leaves. Selective row disclosure at any later time T2 reveals a subset of those committed leaves and proves them into that existing root — no re-anchoring, no re-disclosing the rest, and (for standard mode) no salt keyfile: producing a validated redacted copy needs only the original file + its .mbnt.

This profile defines that segmentation. Its forever-contract scope is narrow on purpose:

The narrow scope is deliberate: a profile this small can be canonicalized to the byte and exhaustively fixture-tested.

2. Inputs and canonicalization

The anchorer feeds raw bytes — the source CSV file as it exists on disk. Before any leaf extraction, the verifier (and the redact-from-original tool) MUST apply the canonicalization rules below in the order given. These rules match parseCsv / csvField / csvRow in web/templates.py byte-for-byte.

Decision (forever): Encoding is UTF-8. Source bytes are decoded as UTF-8 before parsing. The canonical row strings are re-encoded to UTF-8 for hashing.

Decision (forever): Strip ONE leading BOM. If the first decoded code point is U+FEFF, it is removed before parsing — and only the single leading one. A BOM-emitting exporter (Excel, etc.) and a non-BOM-emitting one produce the same leaf-set for the same logical content. (Mechanically: parseCsv checks text.charCodeAt(0) === 0xFEFF and slices it off.)

Decision (forever): RFC-4180 quote-aware parse. A field opens a quoted region on a "; inside a quoted region a "" pair is a literal " and a lone " closes the region; , outside quotes ends a field; an unquoted LF / CR / CRLF ends a row. A ,, LF, or CR inside a quoted field is content, not a separator.

Decision (forever): Row break on unquoted LF / CR / CRLF. All three terminators delimit rows; a CRLF is consumed as a single break (the parser advances past the \n after a \r). Line endings are thus normalized implicitly by re-emission (see below) — a CRLF file and the equivalent LF file produce identical canonical rows and identical leaves.

Decision (forever): No trailing-newline empty row. A trailing terminator does not emit an empty final row. Concretely (matching the anchor): after the parse loop, a final row is appended only if the last field or row buffer is non-empty (if (field.length || row.length)). A file ending …,Writer\n and a file ending …,Writer produce the same rows.

Decision (forever): Minimal re-quote per field (csvField). Each parsed field is re-emitted as follows: if the field contains any of " , LF (0x0A) CR (0x0D), it is wrapped in " and every internal " is doubled to ""; otherwise it is emitted bare. This is the exact predicate /[",\n\r]/ from csvField.

- Consequence (differs from the retired dotted profile): because re-quoting is minimal, a field that was quoted in the source only to wrap an empty string ("") parses to the empty string and re-emits bare. So a source data row "",x,y canonicalizes to ,x,y. The retired salted profile claimed "" stayed distinct from a bare empty field; under the native rule the parse+minimal-re-quote normalizes them to the same canonical bytes. (Quotes that are required — e.g. around a field containing a comma — are of course preserved, because the field still contains a ,.)

Decision (forever): Canonical row = fields joined by ,; canonical doc = canonical rows joined by \n (LF), no trailing newline. The canonical document (rows joined by LF) is what the csv-norm-v1 content hash covers; the per-row leaves are taken from the canonical rows individually (see §3). No whitespace is trimmed; empty fields are legal (,, is a row of two empty cells).

3. Leaf extraction — HEADER EXCLUDED

THE FLIP — read carefully. This is the single most important behavioral difference from the retired dotted satsignal.csv.row.v1 profile, which hashed every row including row 0. Under the native csv-row-v1 rule, row 0 is ALWAYS dropped before the leaf-set is built — regardless of whether it is semantically a header. A verifier or redaction tool that includes row 0 as a leaf will compute a different leaf-set and a different root and will not bind to any real anchor.

Decision (forever): The header row (row 0) is EXCLUDED from the leaf-set. After canonicalization (§2), the leaf-set is canonicalLines.slice(1) — the data rows, dropping the first canonical row. (Matching the anchor: const dataRows = canonicalLines.slice(1).)

Decision (forever): Leaf ordering is data-row document order, zero-indexed. Leaf 0 is the first data row, leaf 1 the second, and so on, in the order they appear in the canonicalized file. The merkle leaf-set order is this order, unchanged. The verifier does NOT re-sort. Document order is the only ordering a verifier can derive from the raw bytes without consulting anchorer intent.

Decision (forever): leaf_id is r<N> with N = the DATA-ROW index zero-padded to six decimal digits. Examples: r000000 (first data row), r000001, r042195. The format is the ASCII literal r followed by exactly six ASCII decimal digits of the data-row index (NOT the file-row index). leaf_id is a display / ordering handle only — it is NOT part of any hash preimage (see §4). Six digits support up to 1,000,000 data rows.

Decision (forever): Empty input is invalid; a header-only file has zero data leaves and is invalid input. leaf_count is the data-row count = canonicalLines.length - 1. An empty file (zero bytes, or zero bytes after BOM strip) has no rows and is invalid. A file with exactly one row (a header and no data rows) has zero data leaves: the anchor emits no chunk_merkle for it, so it cannot be a disclosure source and is invalid input under this profile. A valid csv-row-v1 disclosure source has leaf_count ≥ 1 (i.e. ≥ 2 file rows: one header + ≥ 1 data row).

4. Leaf hash — bare sha256 of the canonical data row (standard mode)

This section defines the exact bytes that go into SHA-256 to produce a data-row leaf's leaf_hash in standard mode (algo: "sha256").

Decision (forever): The leaf hash is the BARE sha256 of the canonical data-row string's UTF-8 bytes:

leaf_hash = SHA-256( utf8( canonical_data_row ) )

There is no profile literal in the preimage, no leaf_id, no salt, and no 0x00 separators. The preimage is exactly the canonical row's UTF-8 bytes — nothing else. (Matching the anchor: sha256Hex(enc.encode(L)) where L is the canonical data-row string.)

This replaces the retired dotted profile's salted preimage (profile_literal || 0x00 || leaf_id || 0x00 || value || 0x00 || salt_raw) in its entirety. Standard csv-row-v1 carries no salt_b64 on its revealed entries (see §5).

The value a disclosure carries for a revealed standard leaf is the canonical re-quoted row string (§2): the exact bytes the leaf hash covers, including any quote characters the minimal re-quote rule preserved. The verifier hashes utf8(value) and compares to leaf_hash; it does not re-canonicalize.

Worked example (NOT placeholders — computed against the anchor rule)

Input CSV bytes (string view, \n = LF byte 0x0A):

name,age,role\nAlice,42,Engineer\nBob,35,Designer\nCarol,29,Writer\n

Canonicalization (§2) yields four canonical rows; row 0 is the header and is EXCLUDED (§3), leaving three data-row leaves:

leaf_idfile rowvalue (canonical row)value UTF-8 (hex)leaf_hash = sha256(utf8(value))
0 (header)name,age,role (EXCLUDED)not a leaf (528a70… if hashed, but it is never hashed)
r0000001Alice,42,Engineer416c6963652c34322c456e67696e656572 (17 B)3147617d8c181d8e8a1748b8c9642bf9dd2c33d0b2b13da2dddf897e6139800a
r0000012Bob,35,Designer426f622c33352c44657369676e6572 (15 B)701287f253f32674ccef5ea56003421c7fe8fb87eedf58042ce133473e1b9731
r0000023Carol,29,Writer4361726f6c2c32392c577269746572 (15 B)f5edf8ce0f5e68dffbd0274e9af59f001102dfd84cecab4caf81e1e7296c988d

leaf_count = 3 (data rows; header excluded). A verifier MUST reproduce the three leaf_hash values exactly from the listed value bytes. If it does not, a step in the canonicalization or the bare-sha256 leaf hash is wrong; debug against the fixtures in §8 before doing anything else.

5. Salts — standard mode is UNSALTED (privacy posture is first-class)

Decision (forever): Standard csv-row-v1 is UNSALTED. There is no salt in the leaf hash (§4) and salt_b64 is ABSENT from standard revealed-leaf entries (revealed[i] carries {leaf_id, profile, value, leaf_hash, proof_path} — no salt_b64). Do not synthesize an empty or zero salt; the field is simply not present.

Privacy posture (must be understood, not "fixed"). A standard anchor's redacted rows are protected only against a party who cannot enumerate the unknown row content. Understate this at your peril; the honest characterization is stronger than "an incidental proof_path sibling leaks":

This is the documented, anchor-time-chosen cost of standard mode — it is not a defect to "fix" by adding salts. Standard mode gives the disclosed-rows-only membership guarantee while leaving every redacted row recoverable at the cost of its own entropy. Do NOT use standard mode to withhold low-entropy or small-space sensitive rows (anything an adversary could enumerate): route that data to sealed mode (per-leaf HKDF salts, documented in §5b), where redacted rows are unguessable because the verifier cannot derive the per-leaf salt without the anchor's master salt and equal withheld rows do not collide. The user chooses this tradeoff when they anchor (standard vs sealed), not at disclosure time — choose sealed before anchoring if any withheld row could be low-entropy.

What audits MUST NOT re-litigate: do NOT re-propose disclosure-specific salts or a salt keyfile for the standard path, and do NOT treat the standard redacted-row brute-forceability as a defect to patch with salts. Sealed is the answer. No keyless scheme can protect a redacted row that is itself the guessable secret; that is the cost of the no-keyfile requirement and the reason sealed exists.

5b. Sealed mode — HMAC leaf under a per-leaf HKDF salt (algo: "merkle-hmac-sha256")

Sealed mode is the privacy path (§5). A sealed CSV anchor commits the same leaf-set segmentation as standard — same §2 canonicalization, same §3 header-exclusion (slice(1), leaf i = file row i + 1), same §6 duplicate-last merkle — but replaces the bare sha256 leaf (§4) with an HMAC under a per-leaf salt derived from the anchor's master salt by HKDF. The user chooses sealed vs standard at anchor time (§5); a disclosure binds to whichever the anchor committed and never re-derives the mode. The sealed rule below mirrors the anchor (web/templates.py deriveLeafSalt / hmacSha256 / merkleRootFromLeafBytes, sealed CSV leaf loop) and SPEC_v2_sealed.md §3.3 byte-for-byte; it is the authoritative selective-disclosure statement of the same construction.

Decision (forever): The per-leaf salt is HKDF-SHA256 of the master salt, with a fixed namespace and a per-leaf big-endian counter. For data-row leaf index i (zero-based; i = file row i + 1, header already excluded):

salt_i = HKDF-SHA256(
            ikm    = master_salt,                       # the 32-byte bearer secret
            salt   = utf8("satsignal-sealed-v1/per-leaf"),
            info   = utf8("chunk/") || u32_be(i),       # "chunk/" then 4-byte BIG-ENDIAN i
            L      = 32                                  # output length, bytes
         )

Decision (forever): The sealed leaf hash is the HMAC-SHA256 of the canonical data-row's UTF-8 bytes, keyed by that row's per-leaf salt:

leaf_hash_i = HMAC-SHA256( key = salt_i, msg = utf8( canonical_data_row_i ) )

The msg is exactly the §2 canonical re-quoted row string's UTF-8 bytes — identical to the standard-mode preimage (§4); only the construction changes from a bare sha256 to a keyed HMAC under salt_i. There is no profile literal, no leaf_id, and no 0x00 separator in the HMAC message.

Decision (forever): The merkle is the §6 duplicate-last tree over the raw 32-byte HMAC commitments. The leaves are the raw 32-byte leaf_hash_i values (not hex, not re-hashed); parents are SHA-256(raw(left) || raw(right)) with the same duplicate-last self-pair on an odd node as standard mode (§6). The inner-node hashing is plain SHA-256 — the per-leaf HMAC is the only salting point, exactly as SPEC_v2_sealed.md §3.3 states. A single-leaf tree's root is that leaf itself.

Decision (forever): The carrier pins algo: "merkle-hmac-sha256" and salt_version: "salt_v1". A sealed CSV anchor stamps subject.proofs.chunk_merkle = {scheme: "csv-row-v1", algo: "merkle-hmac-sha256", salt_version: "salt_v1", leaf_count, root}. The (subject_profile, chunk_merkle.algo) pair ("csv-row-v1", "merkle-hmac-sha256") selects this sealed rule; ("csv-row-v1", "sha256") selects the standard rule (§4). A verifier MUST branch on the carrier algo; the literal alone does not disambiguate.

5b.1 What a sealed disclosure carries — per-leaf salt, NEVER the master

Decision (forever): A revealed sealed leaf carries salt_b64 = the PER-LEAF salt salt_i, base64-encoded — NEVER the master salt. A sealed revealed[i] entry is {leaf_id, profile: "csv-row-v1", value: <canonical re-quoted row string>, salt_b64: base64(salt_i), leaf_hash: HMAC(salt_i, utf8(value)), proof_path}. The verifier recomputes the leaf as HMAC-SHA256(base64decode(salt_b64), utf8(value)) and compares it to leaf_hash, then walks proof_path to the committed root (disclosure-v1.md §7 step 4). For sealed leaves salt_b64 is REQUIRED; a sealed carrier with salt_b64 missing fails the disclosure closed.

Decision (forever) — THE MASTER-SALT-STRIP RULE (security requirement, forever). A csv-row-v1 sealed disclosure .mbnt MUST NOT contain the source anchor's manifest.json, the 32-byte master salt, or any field carrying it (salt_b64 at the manifest/bearer level, bearer_secret: true, etc.). The disclosure carries only the per-leaf salts of the revealed rows, inside each revealed[i] entry. The redact-from-original tool reads the master salt from the source .mbnt manifest.json, derives the per-leaf salts of the revealed rows only, emits them in revealed[], and strips the master salt from all output. This is forever-load-bearing: shipping the master salt in a disclosure lets anyone re-derive every per-leaf salt via HKDF and brute-force (or directly recompute the HMAC of) every redacted row — it unseals the entire table. A disclosure that ships the master salt has defeated the whole point of sealed mode.

Why revealing per-leaf salts of revealed rows is safe. HKDF-Expand is a PRF: revealing salt_i (the output for info = "chunk/" || u32_be(i)) leaks neither the IKM (master_salt) nor any other output salt_j (j ≠ i). So a sealed disclosure that publishes salt_i for each revealed row lets a verifier recompute exactly those rows' HMAC leaves and prove their membership, while every redacted row's commitment stays opaque — its per-leaf salt is underivable without the master salt, and HMAC under an unknown 32-byte key is not brute-forceable even for a low-entropy row value. The formal argument is SPEC_v2_sealed.md §5.3.

5b.2 Privacy posture

Sealed mode is the answer to standard mode's brute-forceability (§5). A standard anchor's redacted rows are confirmable in one sha256 by a party who can guess a low-entropy row. A sealed anchor's redacted rows are unguessable: the leaf is HMAC(salt_i, row) under a per-leaf salt the verifier cannot derive without the master salt, so no candidate row can be tested against an undisclosed commitment. This is the privacy path the user opts into at anchor time (§5). It is achieved with no extra keyfile — the per-leaf salts are HKDF-derived from the master salt the anchor already persisted in the source .mbnt. What audits MUST NOT re-litigate: sealed is the privacy answer; standard's brute-forceability is the documented, anchor-time-chosen cost of the no-keyfile standard path, not a defect to patch with salts.

5b.3 Worked example (NOT placeholders — computed against the anchor rule)

These values are computed by HKDF-SHA256 + HMAC-SHA256 + the duplicate-last merkle over the exact same primitives the browser anchor uses (deriveLeafSalt / hmacSha256 / merkleRootFromLeafBytes). They are NOT placeholders. A frozen sealed corpus lands at tests/vectors/disclosure-v1/csv_row_v1_sealed/ ; these inline vectors keep this section self-contained.

Fixed test master salt = the 32 bytes 0x00 0x01 … 0x1f (000102…1e1f). In a sealed source .mbnt, this is manifest.json's salt_b64 (base64url, the bearer secret, stripped from any disclosure):

master_salt (hex) = 000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f
manifest.json salt_b64 (base64url, NEVER shipped in a disclosure)
                  = AAECAwQFBgcICQoLDA0ODxAREhMUFRYXGBkaGxwdHh8

Input CSV (same as §4's example; header name,age,role is row 0 and is EXCLUDED, leaving three data-row leaves):

name,age,role\nAlice,42,Engineer\nBob,35,Designer\nCarol,29,Writer\n

Per-leaf salts (salt_i = HKDF(master_salt, "satsignal-sealed-v1/per-leaf", "chunk/"||u32_be(i), 32)) and leaves (leaf_hash_i = HMAC(salt_i, utf8(value))):

leaf_idfile rowvalue (canonical row)salt_b64 = base64(salt_i) — the PER-LEAF salt (std base64)leaf_hash = HMAC(salt_i, utf8(value))
r0000001Alice,42,EngineerqMoAQfGknOScBChILtGnu9aA1VXa16fyY79Nvu/NOUA=b4d1776516e344977142e8605cc5c23cb28b3590cf6f6ff38078acb774b851b9
r0000012Bob,35,DesignernE9TkuI5Ift2NOmTOuzvcRkUcnxu4iWJyOUW//86buE=d725af39959bef81bf9ece86a6509622cfe3581a27a353a0fd6098a6150bbb07
r0000023Carol,29,WriterdmVYjcVZH2NJO8hMQMr4GcbCXyVA168ORyLoUiP7Occ=09a03eac822c3c2c0dbd685722d2b5654b6572c53e05f22acb41d87fc3d9275d

leaf_count = 3. The salt_b64 values above are the per-leaf HKDF outputs (std base64, 32 bytes each); each appears in its row's revealed[] entry. None of them is the master salt — a disclosure revealing all three still leaks nothing about the master or about any redacted row in a larger table (HKDF-Expand is a PRF). Note also that the leaf_hash values differ entirely from §4's bare-sha256 leaves for the same rows: a sealed carrier and a standard carrier of the same CSV commit to different roots — they are distinct anchors, distinguished by algo.

Merkle (duplicate-last over the raw 32-byte HMAC commitments; L0[i] = leaf_hash_i):

L1[0] = SHA-256( raw(L0[0]) || raw(L0[1]) )
      = cdfce3bff059980a6fabfb54a8e84091cc9f72f5a8d6251f3726decaa38eb45b

L1[1] = SHA-256( raw(L0[2]) || raw(L0[2]) )   ← DUPLICATE-LAST (Carol self-pairs)
      = b9a45f0dca4ec4eabaf6ca7a6835b9511da3f6bc62247fa08fc1581ab860676b

ROOT  = SHA-256( raw(L1[0]) || raw(L1[1]) )
      = 2207e09f1cafe3cb7099d905d47eef8c998d42a0b2413b3a0a0413110f47f6a3

Proof paths a sealed disclosure carries to reveal each leaf (all walk to ROOT — identical tree shape to §6's standard example, including Carol's two-entry self-sibling path; only the underlying hash values differ because the leaves are HMACs):

reveal r000000 (Alice):
  salt_b64   = "qMoAQfGknOScBChILtGnu9aA1VXa16fyY79Nvu/NOUA="
  leaf_hash  = "b4d1776516e344977142e8605cc5c23cb28b3590cf6f6ff38078acb774b851b9"
  proof_path = [
    { "side": "R", "hash": "d725af39959bef81bf9ece86a6509622cfe3581a27a353a0fd6098a6150bbb07" },  // L0[1] (Bob)
    { "side": "R", "hash": "b9a45f0dca4ec4eabaf6ca7a6835b9511da3f6bc62247fa08fc1581ab860676b" }   // L1[1]
  ]

reveal r000001 (Bob):
  salt_b64   = "nE9TkuI5Ift2NOmTOuzvcRkUcnxu4iWJyOUW//86buE="
  leaf_hash  = "d725af39959bef81bf9ece86a6509622cfe3581a27a353a0fd6098a6150bbb07"
  proof_path = [
    { "side": "L", "hash": "b4d1776516e344977142e8605cc5c23cb28b3590cf6f6ff38078acb774b851b9" },  // L0[0] (Alice)
    { "side": "R", "hash": "b9a45f0dca4ec4eabaf6ca7a6835b9511da3f6bc62247fa08fc1581ab860676b" }   // L1[1]
  ]

reveal r000002 (Carol):  ← the odd last node, TWO-entry self-sibling path
  salt_b64   = "dmVYjcVZH2NJO8hMQMr4GcbCXyVA168ORyLoUiP7Occ="
  leaf_hash  = "09a03eac822c3c2c0dbd685722d2b5654b6572c53e05f22acb41d87fc3d9275d"
  proof_path = [
    { "side": "R", "hash": "09a03eac822c3c2c0dbd685722d2b5654b6572c53e05f22acb41d87fc3d9275d" },  // L0[2] ITSELF — self-sibling
    { "side": "L", "hash": "cdfce3bff059980a6fabfb54a8e84091cc9f72f5a8d6251f3726decaa38eb45b" }   // L1[0]
  ]

A sealed verifier MUST reproduce each leaf_hash exactly from HMAC(base64decode(salt_b64), utf8(value)) and then walk the path to ROOT. If it does not, the HKDF derivation, the HMAC, or the duplicate-last merkle is wrong; debug against this vector before anything else.

6. Merkle behavior — DUPLICATE-LAST on odd

Decision (forever): The merkle is DUPLICATE-LAST on odd nodes, matching merkleRootFromHexLeaves. At each level, nodes are paired left-to-right; when a node is unpaired (the last node at an odd-count level), its right sibling is itself (right = (i+1 < len) ? level[i+1] : level[i]). The parent is SHA-256(raw(left) || raw(right)) — raw 32-byte concatenation, no domain tag. A single-leaf tree's root is that leaf itself (proof_path = []).

This replaces the retired dotted profile's promote-unchanged odd-node rule (where an unpaired node was lifted to the next level without re-hashing). Promote-unchanged and duplicate-last produce different roots for any odd-count level. The native anchor is duplicate-last; this profile is duplicate-last.

The disclosure verifier never rebuilds the root — it only walks proof_path (decode each 64-hex sibling and the frontier to raw bytes, concatenate sibling||frontier for side:"L" or frontier||sibling for side:"R", SHA-256, repeat; the final frontier MUST equal the committed root). The walk is structure-agnostic: it folds whatever siblings it is given and verifies both promote-unchanged and duplicate-last paths. Only the proof-path builder (the redact-from-original tool) encodes the duplicate-last shape — by emitting a self-sibling entry for an odd-promoted node.

Worked example (the §4 three-leaf tree)

Leaves (data rows; L0[i] = leaf i):

L0[0] = 3147617d…800a   (Alice,42,Engineer)
L0[1] = 701287f2…9731   (Bob,35,Designer)
L0[2] = f5edf8ce…988d   (Carol,29,Writer)

Level 1 (3 leaves → odd; the last node self-pairs under duplicate-last):

L1[0] = SHA-256( raw(L0[0]) || raw(L0[1]) )
      = 4d6704c7c8fe0ad82fefbd7c7b530d8eb6087ff369568d6e763801ab9f07b5e6

L1[1] = SHA-256( raw(L0[2]) || raw(L0[2]) )   ← DUPLICATE-LAST (self-pair),
      = e49b438fe484c909f9795172f8aea598123c422bcf7e11f257c53ecd5875609d
        NOT promote-unchanged

ROOT  = SHA-256( raw(L1[0]) || raw(L1[1]) )
      = 19d82f92265bc904b4f356b1f69bb418e96bca56e57785d2d1ae7c1acc8d5e3e

Tree:

                                   ROOT 19d82f…5e3e
                                        |
                    --------------------------------------
                    |                                    |
              L1[0] 4d6704…b5e6                   L1[1] e49b43…609d
              = H( L0[0] || L0[1] )               = H( L0[2] || L0[2] )  (self-pair)
                    |                                    |
            ----------------                      (Carol pairs with herself)
            |              |                             |
        L0[0]            L0[1]                          L0[2]
      Alice,42,…       Bob,35,…                       Carol,29,…

Proof paths a disclosure carries to reveal each leaf (all walk to ROOT):

reveal r000000 (Alice):
  proof_path = [
    { "side": "R", "hash": "701287f253f32674ccef5ea56003421c7fe8fb87eedf58042ce133473e1b9731" },  // L0[1] (Bob)
    { "side": "R", "hash": "e49b438fe484c909f9795172f8aea598123c422bcf7e11f257c53ecd5875609d" }   // L1[1]
  ]

reveal r000001 (Bob):
  proof_path = [
    { "side": "L", "hash": "3147617d8c181d8e8a1748b8c9642bf9dd2c33d0b2b13da2dddf897e6139800a" },  // L0[0] (Alice)
    { "side": "R", "hash": "e49b438fe484c909f9795172f8aea598123c422bcf7e11f257c53ecd5875609d" }   // L1[1]
  ]

reveal r000002 (Carol):  ← the odd last node, TWO-entry path
  proof_path = [
    { "side": "R", "hash": "f5edf8ce0f5e68dffbd0274e9af59f001102dfd84cecab4caf81e1e7296c988d" },  // L0[2] ITSELF — self-sibling
    { "side": "L", "hash": "4d6704c7c8fe0ad82fefbd7c7b530d8eb6087ff369568d6e763801ab9f07b5e6" }   // L1[0]
  ]

The Carol path is the duplicate-last signature. Carol is the odd-promoted node at level 0. Under duplicate-last her sibling at level 0 is her own leaf hash ({side:"R", hash: L0[2]}) — folding H(L0[2]||L0[2]) = L1[1] — and then she folds against L1[0] at level 1. So her path has two entries. Under the retired promote-unchanged rule the same reveal had a one-entry path (Carol skipped level 0 and folded directly against L1[0]). The proof-path walk verifies both; only the builder differs. This is the canonical illustration of the odd-node rule for csv-row-v1.

7. Original anchor binding

A standard CSV anchor commits the csv-row-v1 leaf-set under the original .mbnt canonical document's subject.proofs.chunk_merkle field. The required pins (matching what buildCsvProofs stamps):

canonical fieldrequired value under this profile
subject.proofs.chunk_merkle.schemeexactly "csv-row-v1" (the hyphenated literal anchors emit)
subject.proofs.chunk_merkle.algo"sha256" (standard mode; sealed mode uses "merkle-hmac-sha256")
subject.proofs.chunk_merkle.leaf_countthe data-row count (header excluded, §3)
subject.proofs.chunk_merkle.rootduplicate-last merkle root over the data-row leaves (§6)

A disclosure under disclosure-v1.md carries this same literal in disclosure.linked_anchor.subject_profile; each revealed leaf's profile field MUST equal "csv-row-v1". The verifier binds to the chunk_merkle the anchor already committed — there is no re-anchor and no new scheme. The binding chain (master spec §4) walks revealed[i].value → leaf_hash → linked_anchor.root → original canonical-doc chunk_merkle.root → on-chain document_hash.

Forbidden-variant note (flipped). A verifier MUST apply this native rule (bare-sha256 leaf, header-excluded, duplicate-last merkle) to an anchor whose chunk_merkle.scheme == "csv-row-v1" and algo == "sha256". It MUST NOT apply the retired dotted profile's salted preimage to a csv-row-v1 anchor. The dotted literal satsignal.csv.row.v1 is a different, deprecated scheme (§9); if a verifier sees that dotted scheme it MUST NOT apply this native rule to it, and vice versa. The distinguishing key is the (subject_profile, chunk_merkle.algo) pair: ("csv-row-v1", "sha256") is this standard rule; ("csv-row-v1", "merkle-hmac-sha256") is the sealed rule (§5b).

8. Fixtures (test vectors)

All leaf_hash / root values below were computed by SHA-256 over the bytes defined in §4/§6 (verified against the anchor code and the reference implementation). They are NOT placeholders. A verifier that does not reproduce them from the listed inputs has a bug.

A frozen native corpus lands at tests/vectors/disclosure-v1/csv_row_v1_native/: positive disclosures plus negatives (tampered value → leaf_hash_mismatch, wrong proof path → merkle_path_mismatch, header-included mistake). The following inline vectors keep this spec self-contained (a profile spec is not complete without vectors).

N1: minimal — header + 3 data rows, LF endings

Input (string view; \n = LF):

name,age,role\nAlice,42,Engineer\nBob,35,Designer\nCarol,29,Writer\n

Canonical rows: ["name,age,role", "Alice,42,Engineer", "Bob,35,Designer", "Carol,29,Writer"]. Header name,age,role is row 0 and is EXCLUDED. leaf_count = 3. Leaves, tree, and the three proof paths are the §4/§6 worked example:

leaf_idvalueleaf_hash
r000000Alice,42,Engineer3147617d8c181d8e8a1748b8c9642bf9dd2c33d0b2b13da2dddf897e6139800a
r000001Bob,35,Designer701287f253f32674ccef5ea56003421c7fe8fb87eedf58042ce133473e1b9731
r000002Carol,29,Writerf5edf8ce0f5e68dffbd0274e9af59f001102dfd84cecab4caf81e1e7296c988d

root = 19d82f92265bc904b4f356b1f69bb418e96bca56e57785d2d1ae7c1acc8d5e3e. Revealed entries carry no salt_b64.

N2: CRLF input — same content as N1 with \r\n between rows

name,age,role\r\nAlice,42,Engineer\r\nBob,35,Designer\r\nCarol,29,Writer\r\n

Canonical rows are byte-identical to N1 (CRLF row-breaks re-emit as LF-joined canonical rows; leaves are per-row strings, unaffected). leaf_count = 3; the three leaves and root are identical to N1. Pins that CRLF and LF sources produce the same leaf-set.

N3: BOM-prefixed input — U+FEFF then N1's content

[U+FEFF] name,age,role\nAlice,42,Engineer\nBob,35,Designer\nCarol,29,Writer\n

The single leading BOM is stripped before parsing; canonical rows, leaves, and root are identical to N1. Pins that a BOM-emitting exporter and a non-BOM one produce the same leaf-set.

N4: trailing newline absent — N1's content without the final \n

name,age,role\nAlice,42,Engineer\nBob,35,Designer\nCarol,29,Writer

No trailing terminator → no empty final row (§2). Canonical rows, leaves, and root are identical to N1. Pins that the leaf hash covers the row's content, not any terminator.

N5: even leaf count — header + 2 data rows (clean pairing)

h\nAlice,42,Engineer\nBob,35,Designer

leaf_count = 2; leaves r000000 (Alice,42,Engineer, 3147617d…800a) and r000001 (Bob,35,Designer, 701287f2…9731). With an even count there is no self-pair:

root = SHA-256( raw(L0[0]) || raw(L0[1]) )
     = 4d6704c7c8fe0ad82fefbd7c7b530d8eb6087ff369568d6e763801ab9f07b5e6

(Note this equals L1[0] of N1, as expected.) Proof paths: r000000 → [{R, 701287…9731}]; r000001 → [{L, 3147617…800a}].

N6: quoted comma — data row "Smith, John",42,Engineer

Source (header col + one data row): col\n"Smith, John",42,Engineer. The comma inside the quoted field is content; the field still contains a , so the minimal re-quote preserves the quotes. leaf_count = 1 (single-leaf tree, proof_path = [], root == leaf):

leaf_idvalueleaf_hash
r000000"Smith, John",42,Engineer233e061e7c3a3ddca8bc8812161444335337c4af179f556443e9e03dac05c34e

Pins quote-aware parsing (no split on the embedded comma) and quote preservation for a required quote.

N7: escaped quote — data row "He said ""hi""",x,y

Source: col\n"He said ""hi""",x,y. The "" pairs are a literal " inside the quoted field; the field contains ", so minimal re-quote preserves the wrapping quotes and the doubled internal quotes verbatim. leaf_count = 1:

leaf_idvalueleaf_hash
r000000"He said ""hi""",x,yc0ccf8ba1b4cb1731873f2d907935949d8baa5682c4c9df1ed016e6ff8b869ba

Pins that the canonical bytes carry the "" escape verbatim (the canonicalizer does NOT decode "" to " before hashing).

N8: embedded LF in a quoted field — data row "line1\nline2",x

Source: col\n"line1\nline2",x where \n (0x0A) sits inside the quoted field (not a row break). RFC-4180 quoting keeps it as content; the field contains \n, so minimal re-quote preserves the wrapping quotes. leaf_count = 1 (the embedded LF does NOT split the row):

leaf_idvalue (escaped)leaf_hash
r000000"line1\nline2",xd6af32b6bb9204df6131de7052a1631afcd99014ecb4e8a1825a41cdf60b4c0f

(The \n in value is the literal LF byte between line1 and line2.) Pins that quote-aware row splitting is part of the contract: an LF inside a quoted field is content, not a terminator.

N9: empty cells — data row ,,foo,,

Source: h1,h2,h3,h4,h5\n,,foo,,. The data row has five fields (four empty + foo); none needs quoting, so all re-emit bare. leaf_count = 1:

leaf_idvalueleaf_hash
r000000,,foo,,52ed76e0ab0db728fcfb6631d642019cf766a28767bb2eb0612cff50d7953e9e

Pins that empty cells are legal and the delimiters at empty-cell positions are part of the canonical bytes.

N10: quoted-empty normalizes to bare — data row "",x,y

Source: col\n"",x,y. The first field is an empty quoted string; it parses to the empty string, which contains none of " , LF CR, so minimal re-quote emits it bare. The canonical row is therefore ,x,y (NOT "",x,y). leaf_count = 1:

leaf_idsource fieldcanonical valueleaf_hash
r000000"" (quoted),x,ya8c72b638eee690282c29d60ecd295e2454d8c938b14af94f84fabc651b999e6

Pins the minimal-re-quote consequence (§2): under the native rule a source "",x,y and a source ,x,y canonicalize to the same bytes and hash to the same leaf. (This is the documented behavioral difference from the retired dotted profile, which kept "" distinct.)

9. Out of scope / deprecation pointers

Every future profile gets its own literal. This profile's literal is the hyphenated csv-row-v1, fixed forever as the rule on-chain CSV anchors already commit, applied to anchors whose subject_profile == "csv-row-v1" under either carrier algo: chunk_merkle.algo == "sha256" selects the standard rule (§§2–8), chunk_merkle.algo == "merkle-hmac-sha256" (salt_version: "salt_v1") selects the sealed rule (§5b). A verifier distinguishes the two by the (subject_profile, chunk_merkle.algo) pair.

Questions about this specification? Email hello@satsignal.cloud.