text-line-v1 — selective-disclosure profile for TEXT line leaves (native anchor rule)

Status: active. This profile is the disclosure-side write-down of the per-line chunk_merkle rule every text anchor already commits — it adds no new on-chain behavior. It repeats, for text, exactly what csv-row-v1.md does for CSV. subject_profile literal: text-line-v1 (hyphenated — the literal text anchors stamp into subject.proofs.chunk_merkle.scheme). This is a NATIVE profile: the leaf is the bare sha256(utf8(line)) (standard) or HMAC(per-leaf HKDF salt, utf8(line)) (sealed) — never the salted framed preimage of the deprecated satsignal.text.paragraph_sentence.v1.

1. Why this exists

A selective disclosure proves a revealed unit into the exact merkle leaf the anchor already committed on chain. Text anchors chunk a .txt / .md file by line under chunk_merkle.scheme = "text-line-v1". To redact a text file you already anchored — revealing some lines, withholding others — the disclosure leaf rule MUST equal the anchor's text-line-v1 leaf rule byte-for-byte. This profile pins that rule.

Granularity is LINE (locked). Sentence-level redaction has no on-chain sentence leaves to prove into and is a planned FUTURE anchor scheme (text-sentence-v1), not this profile. The deprecated satsignal.text.paragraph_sentence.v1 profile cannot bind to a live text anchor (salted/framed leaf + sentence granularity); see §9.

2. Inputs and canonicalization (`text-norm-v1`)

The leaf-set is computed from the original file bytes via the SAME canon the anchor applies (web/templates.py normalizeTextForCanonical / normalizeText, byte-identical in the standard and sealed anchor branches):

BOM strip — remove ONE leading U+FEFF if present.
NFC — String.prototype.normalize("NFC") over the whole string.
Line endings — replace \r\n and a lone \r with \n (regex /\r\n?/g → \n).
Per-line trailing whitespace — strip a trailing run of spaces and tabs (/[ \t]+$/) from each line. Interior whitespace is preserved.

The decoder is lenient UTF-8 (invalid bytes → U+FFFD), matching the anchor's file.text(). A file the anchor accepted recomputes to the same leaves here; a true mismatch surfaces as the distinct recompute-mismatch failure (§7), never a silent reject.

The content-canonical hash is sha256 of the full canonical string under scheme text-norm-v1; it is not part of the per-leaf rule but is the anchor's content_canonical.

3. Leaf extraction — split on `\n`, DROP empty lines, NO header

After canonicalization, segment into leaves:

leaves = canonical.split("\n").filter(L => L.length > 0)

Empty lines are DROPPED. A blank line (or a line that was only trailing whitespace, now stripped to "") is not a leaf. The trailing "" produced by a final \n is likewise dropped.
No header concept. Unlike csv-row-v1 (which excludes row 0), text-line-v1 keeps every non-empty line: leaf 0 is the first non-empty line.
Leaf ordering is non-empty-line document order, zero-indexed. A leaf index i is its position in the filtered non-empty-line list — it is NOT the source file line number (blank lines shift the mapping).
leaf_id = "l" + 6-digit zero-padded leaf index (e.g. l000000). Display / ordering handle only — NOT part of any hash preimage. (The "l" prefix differs from csv-row-v1's "r" for readability only; it is not load-bearing.)

A file with zero non-empty lines is not a valid text-line-v1 disclosure source (no leaves to prove); the tool fails with invalid_text_empty.

4. Leaf hash — bare `sha256` of the canonical line (standard mode)

For a STANDARD text anchor (chunk_merkle.algo == "sha256"):

leaf_hash_i = sha256( utf8( canonical_line_i ) )

Bare — no profile literal, no leaf_id, no salt, no 0x00 separators. A standard text-line-v1 revealed[i] carries {leaf_id, profile: "text-line-v1", value: <canonical line string>, leaf_hash, proof_path} and no salt_b64 (§5).

The verifier's value→bytes rule is utf8(value); it recomputes sha256(utf8(value)) and compares to the published leaf_hash, then walks proof_path to linked_anchor.root.

Worked example (NOT placeholders — computed against the anchor rule)

Source file bytes (BOM + CRLF, a blank line, and a trailing-whitespace line):

 "First line of the memo.\r\n" "\r\n"
       "Second line has trailing spaces.   \r\n" "Third and final line.\r\n"

Canon + segmentation → 3 non-empty-line leaves (the blank line dropped; line 3's trailing spaces stripped):

leaf_id	value (canonical line)	`sha256(utf8(value))`
l000000	`First line of the memo.`	`8187a5534ddc483f4c936f872837a8bf39d2d225f18307c496954cbe7dffe119`
l000001	`Second line has trailing spaces.`	`e3baa7985e2bcdf6cf2d7363cec05456728a2a5d3f98b37542392f24e9a72162`
l000002	`Third and final line.`	`dd52bddca31b1a03b84525208f2d4e7dd2132f4b90621dd674eb3a69c9a2c428`

These are frozen in tests/vectors/disclosure-v1/text_line_v1_native/N1.fixture.json.

5. Salts — standard mode is UNSALTED (privacy posture is first-class)

Standard text-line-v1 leaves are unsalted bare sha256(utf8(line)). The honest characterization is stronger than "an incidental proof_path sibling leaks":

The standard .mbnt publishes EVERY leaf hash, including redacted lines. proofs.json carries merkle_leaves = the complete ordered list of every non-empty-line leaf hash, redacted lines included. A holder of the standard bundle has the exact sha256(canonical_line) of each withheld line and can guess-and-confirm it entirely offline — not only when a withheld line happens to sit on a revealed line's proof_path.
Zero per-leaf entropy ⇒ identical withheld lines have identical leaf hashes. With no salt, no leaf_id, and no profile tag in the preimage (§4), two redacted lines with the same canonical content produce the same leaf hash, so an observer can tell which redacted lines share a value (cross-equality leak) without recovering it.
A withheld line's recovery cost equals THAT LINE'S OWN entropy. Lines drawn from a small or structured space — a short status line, a boolean or enum, a date, a name from a known list, or a known-format identifier — are trivially recoverable: an adversary confirms a guess by hashing it once against the published leaf hash. Only genuinely high-entropy free-form lines are protected.

This is the anchor-time-chosen tradeoff, not a defect — the discloser accepted it by anchoring in standard mode. Do NOT use standard mode to withhold low-entropy or small-space sensitive lines; route that data to sealed mode (§5b), where redacted lines are unguessable and equal withheld lines do not collide. No keyless scheme can protect a redacted line that is itself the guessable secret; that is the cost of the no-keyfile requirement and the reason sealed exists. Choose sealed before anchoring if any withheld line could be low-entropy.

A standard revealed[i] MUST NOT carry salt_b64. The structural schema treats salt_b64 as optional for text-line-v1; the verifier ignores any stray salt_b64 under the bare-sha256 standard rule.

5b. Sealed mode — HMAC leaf under a per-leaf HKDF salt (`algo: "merkle-hmac-sha256"`)

For a SEALED text anchor (chunk_merkle.algo == "merkle-hmac-sha256", chunk_merkle.salt_version == "salt_v1"), the leaf is keyed:

salt_i     = HKDF-SHA256(ikm = master_salt,
                         salt = "satsignal-sealed-v1/per-leaf",
                         info = "chunk/" || u32_be(i), L = 32)
leaf_hash_i = HMAC-SHA256(key = salt_i, msg = utf8(canonical_line_i))

This is the same per-leaf HKDF/HMAC derivation the sealed CSV anchor uses — the anchor's sealed merkle assembly is generic across file types (web/templates.py ~6056). Only the leaf hash differs from standard; canonicalization (§2), segmentation (§3), and the merkle (§6) are identical.

A sealed revealed[i] carries salt_b64 = base64(salt_i) — the PER-LEAF salt for that revealed line. salt_b64 is REQUIRED for a sealed leaf; a sealed carrier with a revealed leaf missing salt_b64 fails closed with sealed_leaf_missing_salt.

5b.1 What a sealed disclosure carries — per-leaf salt, NEVER the master

The redact tool reads the 32-byte master salt from the SOURCE .mbnt manifest.json (salt_b64, base64url) and derives the per-leaf salts. The disclosure output carries ONLY the per-leaf salts of the revealed lines. THE MASTER-SALT-STRIP RULE (forever): a disclosure .mbnt MUST NOT contain the master salt in any encoding, and MUST NOT carry a redacted line's per-leaf salt. Shipping the master salt re-derives every per-leaf salt and unseals every redacted line. The tool enforces this structurally (it never ships the source manifest.json) and with a P0 runtime guard (redact-core.mjs:_assertMasterSaltStripped, scheme/mode-independent). Revealing the per-leaf HKDF salts of revealed lines leaks nothing about the master salt or other lines (HKDF-Expand is a PRF).

5b.2 Privacy posture

A sealed redacted line is unguessable: its leaf is an HMAC under a per-leaf salt the verifier cannot derive without the master salt, which the disclosure never carries. Standard = disclosed-lines-only guarantee with brute-forceable redacted lines; sealed = redacted lines stay private. The choice is made at anchor time.

5b.3 Worked example (NOT placeholders)

Same 3-line source as §4; master salt = 0x00 0x01 … 0x1f (the bearer secret, NEVER shipped). Sealed leaves:

leaf_id	value	`HMAC(salt_i, utf8(value))`
l000000	`First line of the memo.`	`625041249f20c24a50eeb4dde7e121520a245e0d7e03b1b7b3b09f8b6f94d48d`
l000002	`Third and final line.`	`ac4d39834feb39faa165dd89fd4888a28b6280411f953522ba512cd00f45f415`

Frozen in tests/vectors/disclosure-v1/text_line_v1_native_sealed/S1.fixture.json.

6. Merkle behavior — DUPLICATE-LAST on odd

The tree is duplicate-last-on-odd, identical to csv-row-v1 and to the anchor (merkleRootFromHexLeaves / merkleRootFromLeafBytes): at each level an odd last node pairs with itself (SHA-256(node || node)). The verifier only walks proof_path — it never rebuilds the root — so the duplicate-last tree verifies with no merkle-walk change. The redact tool emits duplicate-last-correct paths (a self-sibling entry for the odd-promoted node).

Worked example (the §4 three-leaf tree)

Leaves A=l000000, B=l000001, C=l000002 (the §4 hashes).

Level 0 → 1: pair A,B → L1[0] = SHA-256(A || B) = c5ec33d952be464863cf30c6cb6eb90ddbba981c6b22ff4e0acc71728408569d. C is the odd last node → self-pairs → L1[1] = SHA-256(C || C).
ROOT = SHA-256(L1[0] || L1[1]) = 5e4f6278d3e8f1a8175e8f635e76f828262e48b3f4e5b6ffd326357cc04a607c.

Proof paths (frozen in N1):

l000000 (A) — [{R, B}, {R, L1[1]}] (B is the level-0 sibling; L1[1] at level 1).
l000002 (C) — the two-entry self-sibling path [{R, C(itself)}, {L, L1[0]}]: fold C against its own hash to reach L1[1], then against L1[0] to reach the root. The verifier MUST walk this; it must NOT reject the self-sibling entry or assume promote-unchanged.

7. Original anchor binding

A disclosure binds to the existing anchor via the §4 chain of disclosure-v1.md: the carrier canonical.json (carried VERBATIM) hashes to the on-chain document_hash; its subject.proofs.chunk_merkle.root equals linked_anchor.root; its scheme equals linked_anchor.subject_profile == "text-line-v1"; and its algo selects the leaf rule (sha256 standard / merkle-hmac-sha256 sealed). The redact tool recomputes the leaves from the original file, hard-fails if they do not match the committed merkle_leaves + root (wrong file / wrong bundle / edited file), then builds proof paths for the revealed lines. No re-anchor; no new scheme.

The redacted copy emits the canonical non-empty lines (NFC, trailing-ws stripped, blank lines dropped) — revealed lines as their value, redacted lines as [REDACTED], positions preserved among the leaf-set, \n-joined. This is what is cryptographically attested; presentation.format == "txt", presentation.view_sha256 == sha256(redacted bytes).

8. Fixtures (test vectors)

[FOREVER-CONTRACT] — disclosure-v1.md §11 forbids a profile without vectors. Frozen, oracle-computed + tool-cross-checked:

Standard: tests/vectors/disclosure-v1/text_line_v1_native/
- N1 — the §4 happy path (reveal l000000 + l000002, redact l000001; no salt_b64).
- N2_linked_anchor_profile_mismatch — carrier scheme text-line-v2, root equal → linked_anchor_profile_mismatch.
- N3_empty_line_included_mistake — a non-conformant discloser kept a blank line as a leaf (wrong tree); the bare-sha256 leaf still matches but the wrong-tree path misses the root → merkle_path_mismatch (pins §3 drop-empties).
- negatives/ overlays — leaf_hash_mismatch, merkle_path_mismatch, linked_anchor_root_mismatch, linked_anchor_canonical_hash_mismatch.
Sealed: tests/vectors/disclosure-v1/text_line_v1_native_sealed/
- S1 — the §5b happy path (per-leaf salt_b64; odd-last self-sibling).
- negatives/ — S1_leaf_hash_mismatch, S1_wrong_salt, S1_merkle_path_mismatch, S1_linked_anchor_root_mismatch, S1_linked_anchor_canonical_hash_mismatch, S2_missing_salt (sealed_leaf_missing_salt), S3_wrong_salt_version (unsupported_linked_algo), S4_linked_anchor_profile_mismatch.

9. Out of scope / deprecation pointers

Sentence/paragraph granularity is NOT this profile. The deprecated salted satsignal.text.paragraph_sentence.v1 (text-paragraph-sentence-v1.md) chunks by sentence with a profile||0x00||leaf_id||0x00||value||0x00||salt preimage and cannot bind to a live text-line-v1 anchor. It stays inert (allowlist literals are never removed; its frozen corpus is a regression guard only). A future text-sentence-v1 ANCHOR scheme + a de-salted native profile is a separate effort.
Binary text containers (PDF text, .docx) anchor under their own schemes (pdf-page-v1, zip-file-v1) and are out of scope here.

11. Profile registry pointer

Registered in disclosure-v1.md §11. text-line-v1 is the native text-line rule that text anchors actually emit; a disclosure binds to the chunk_merkle the anchor already committed (scheme == "text-line-v1"), revealing a subset of its per-line leaves — no re-anchor, no new scheme. Leaf rule: §§2–4 standard, §5b sealed; merkle §6; binding §7; vectors §8.

Questions about this specification? Email hello@satsignal.cloud.