text-line-v1 — selective-disclosure profile for TEXT line leaves (native anchor rule)

Status: active. This profile is the disclosure-side write-down of the per-line chunk_merkle rule every text anchor already commits — it adds no new on-chain behavior. It repeats, for text, exactly what csv-row-v1.md does for CSV. subject_profile literal: text-line-v1 (hyphenated — the literal text anchors stamp into subject.proofs.chunk_merkle.scheme). This is a NATIVE profile: the leaf is the bare sha256(utf8(line)) (standard) or HMAC(per-leaf HKDF salt, utf8(line)) (sealed) — never the salted framed preimage of the deprecated satsignal.text.paragraph_sentence.v1.


1. Why this exists

A selective disclosure proves a revealed unit into the exact merkle leaf the anchor already committed on chain. Text anchors chunk a .txt / .md file by line under chunk_merkle.scheme = "text-line-v1". To redact a text file you already anchored — revealing some lines, withholding others — the disclosure leaf rule MUST equal the anchor's text-line-v1 leaf rule byte-for-byte. This profile pins that rule.

Granularity is LINE (locked). Sentence-level redaction has no on-chain sentence leaves to prove into and is a planned FUTURE anchor scheme (text-sentence-v1), not this profile. The deprecated satsignal.text.paragraph_sentence.v1 profile cannot bind to a live text anchor (salted/framed leaf + sentence granularity); see §9.


2. Inputs and canonicalization (text-norm-v1)

The leaf-set is computed from the original file bytes via the SAME canon the anchor applies (web/templates.py normalizeTextForCanonical / normalizeText, byte-identical in the standard and sealed anchor branches):

  1. BOM strip — remove ONE leading U+FEFF if present.
  2. NFCString.prototype.normalize("NFC") over the whole string.
  3. Line endings — replace \r\n and a lone \r with \n (regex /\r\n?/g\n).
  4. Per-line trailing whitespace — strip a trailing run of spaces and tabs (/[ \t]+$/) from each line. Interior whitespace is preserved.

The decoder is lenient UTF-8 (invalid bytes → U+FFFD), matching the anchor's file.text(). A file the anchor accepted recomputes to the same leaves here; a true mismatch surfaces as the distinct recompute-mismatch failure (§7), never a silent reject.

The content-canonical hash is sha256 of the full canonical string under scheme text-norm-v1; it is not part of the per-leaf rule but is the anchor's content_canonical.


3. Leaf extraction — split on \n, DROP empty lines, NO header

After canonicalization, segment into leaves:

leaves = canonical.split("\n").filter(L => L.length > 0)

A file with zero non-empty lines is not a valid text-line-v1 disclosure source (no leaves to prove); the tool fails with invalid_text_empty.


4. Leaf hash — bare sha256 of the canonical line (standard mode)

For a STANDARD text anchor (chunk_merkle.algo == "sha256"):

leaf_hash_i = sha256( utf8( canonical_line_i ) )

Bare — no profile literal, no leaf_id, no salt, no 0x00 separators. A standard text-line-v1 revealed[i] carries {leaf_id, profile: "text-line-v1", value: <canonical line string>, leaf_hash, proof_path} and no salt_b64 (§5).

The verifier's value→bytes rule is utf8(value); it recomputes sha256(utf8(value)) and compares to the published leaf_hash, then walks proof_path to linked_anchor.root.

Worked example (NOT placeholders — computed against the anchor rule)

Source file bytes (BOM + CRLF, a blank line, and a trailing-whitespace line):

 "First line of the memo.\r\n" "\r\n"
       "Second line has trailing spaces.   \r\n" "Third and final line.\r\n"

Canon + segmentation → 3 non-empty-line leaves (the blank line dropped; line 3's trailing spaces stripped):

leaf_idvalue (canonical line)sha256(utf8(value))
l000000First line of the memo.8187a5534ddc483f4c936f872837a8bf39d2d225f18307c496954cbe7dffe119
l000001Second line has trailing spaces.e3baa7985e2bcdf6cf2d7363cec05456728a2a5d3f98b37542392f24e9a72162
l000002Third and final line.dd52bddca31b1a03b84525208f2d4e7dd2132f4b90621dd674eb3a69c9a2c428

These are frozen in tests/vectors/disclosure-v1/text_line_v1_native/N1.fixture.json.


5. Salts — standard mode is UNSALTED (privacy posture is first-class)

Standard text-line-v1 leaves are unsalted bare sha256(utf8(line)). The honest characterization is stronger than "an incidental proof_path sibling leaks":

This is the anchor-time-chosen tradeoff, not a defect — the discloser accepted it by anchoring in standard mode. Do NOT use standard mode to withhold low-entropy or small-space sensitive lines; route that data to sealed mode (§5b), where redacted lines are unguessable and equal withheld lines do not collide. No keyless scheme can protect a redacted line that is itself the guessable secret; that is the cost of the no-keyfile requirement and the reason sealed exists. Choose sealed before anchoring if any withheld line could be low-entropy.

A standard revealed[i] MUST NOT carry salt_b64. The structural schema treats salt_b64 as optional for text-line-v1; the verifier ignores any stray salt_b64 under the bare-sha256 standard rule.


5b. Sealed mode — HMAC leaf under a per-leaf HKDF salt (algo: "merkle-hmac-sha256")

For a SEALED text anchor (chunk_merkle.algo == "merkle-hmac-sha256", chunk_merkle.salt_version == "salt_v1"), the leaf is keyed:

salt_i     = HKDF-SHA256(ikm = master_salt,
                         salt = "satsignal-sealed-v1/per-leaf",
                         info = "chunk/" || u32_be(i), L = 32)
leaf_hash_i = HMAC-SHA256(key = salt_i, msg = utf8(canonical_line_i))

This is the same per-leaf HKDF/HMAC derivation the sealed CSV anchor uses — the anchor's sealed merkle assembly is generic across file types (web/templates.py ~6056). Only the leaf hash differs from standard; canonicalization (§2), segmentation (§3), and the merkle (§6) are identical.

A sealed revealed[i] carries salt_b64 = base64(salt_i) — the PER-LEAF salt for that revealed line. salt_b64 is REQUIRED for a sealed leaf; a sealed carrier with a revealed leaf missing salt_b64 fails closed with sealed_leaf_missing_salt.

5b.1 What a sealed disclosure carries — per-leaf salt, NEVER the master

The redact tool reads the 32-byte master salt from the SOURCE .mbnt manifest.json (salt_b64, base64url) and derives the per-leaf salts. The disclosure output carries ONLY the per-leaf salts of the revealed lines. THE MASTER-SALT-STRIP RULE (forever): a disclosure .mbnt MUST NOT contain the master salt in any encoding, and MUST NOT carry a redacted line's per-leaf salt. Shipping the master salt re-derives every per-leaf salt and unseals every redacted line. The tool enforces this structurally (it never ships the source manifest.json) and with a P0 runtime guard (redact-core.mjs:_assertMasterSaltStripped, scheme/mode-independent). Revealing the per-leaf HKDF salts of revealed lines leaks nothing about the master salt or other lines (HKDF-Expand is a PRF).

5b.2 Privacy posture

A sealed redacted line is unguessable: its leaf is an HMAC under a per-leaf salt the verifier cannot derive without the master salt, which the disclosure never carries. Standard = disclosed-lines-only guarantee with brute-forceable redacted lines; sealed = redacted lines stay private. The choice is made at anchor time.

5b.3 Worked example (NOT placeholders)

Same 3-line source as §4; master salt = 0x00 0x01 … 0x1f (the bearer secret, NEVER shipped). Sealed leaves:

leaf_idvalueHMAC(salt_i, utf8(value))
l000000First line of the memo.625041249f20c24a50eeb4dde7e121520a245e0d7e03b1b7b3b09f8b6f94d48d
l000002Third and final line.ac4d39834feb39faa165dd89fd4888a28b6280411f953522ba512cd00f45f415

Frozen in tests/vectors/disclosure-v1/text_line_v1_native_sealed/S1.fixture.json.


6. Merkle behavior — DUPLICATE-LAST on odd

The tree is duplicate-last-on-odd, identical to csv-row-v1 and to the anchor (merkleRootFromHexLeaves / merkleRootFromLeafBytes): at each level an odd last node pairs with itself (SHA-256(node || node)). The verifier only walks proof_path — it never rebuilds the root — so the duplicate-last tree verifies with no merkle-walk change. The redact tool emits duplicate-last-correct paths (a self-sibling entry for the odd-promoted node).

Worked example (the §4 three-leaf tree)

Leaves A=l000000, B=l000001, C=l000002 (the §4 hashes).

Proof paths (frozen in N1):


7. Original anchor binding

A disclosure binds to the existing anchor via the §4 chain of disclosure-v1.md: the carrier canonical.json (carried VERBATIM) hashes to the on-chain document_hash; its subject.proofs.chunk_merkle.root equals linked_anchor.root; its scheme equals linked_anchor.subject_profile == "text-line-v1"; and its algo selects the leaf rule (sha256 standard / merkle-hmac-sha256 sealed). The redact tool recomputes the leaves from the original file, hard-fails if they do not match the committed merkle_leaves + root (wrong file / wrong bundle / edited file), then builds proof paths for the revealed lines. No re-anchor; no new scheme.

The redacted copy emits the canonical non-empty lines (NFC, trailing-ws stripped, blank lines dropped) — revealed lines as their value, redacted lines as [REDACTED], positions preserved among the leaf-set, \n-joined. This is what is cryptographically attested; presentation.format == "txt", presentation.view_sha256 == sha256(redacted bytes).


8. Fixtures (test vectors)

[FOREVER-CONTRACT] — disclosure-v1.md §11 forbids a profile without vectors. Frozen, oracle-computed + tool-cross-checked:


9. Out of scope / deprecation pointers


11. Profile registry pointer

Registered in disclosure-v1.md §11. text-line-v1 is the native text-line rule that text anchors actually emit; a disclosure binds to the chunk_merkle the anchor already committed (scheme == "text-line-v1"), revealing a subset of its per-line leaves — no re-anchor, no new scheme. Leaf rule: §§2–4 standard, §5b sealed; merkle §6; binding §7; vectors §8.

Questions about this specification? Email hello@satsignal.cloud.