text-line-v1 — selective-disclosure profile for TEXT line leaves (native anchor rule)
Status: active. This profile is the disclosure-side write-down of the per-line
chunk_merklerule every text anchor already commits — it adds no new on-chain behavior. It repeats, for text, exactly whatcsv-row-v1.mddoes for CSV.subject_profileliteral:text-line-v1(hyphenated — the literal text anchors stamp intosubject.proofs.chunk_merkle.scheme). This is a NATIVE profile: the leaf is the baresha256(utf8(line))(standard) orHMAC(per-leaf HKDF salt, utf8(line))(sealed) — never the salted framed preimage of the deprecatedsatsignal.text.paragraph_sentence.v1.
1. Why this exists
A selective disclosure proves a revealed unit into the exact merkle leaf the anchor already committed on chain. Text anchors chunk a .txt / .md file by line under chunk_merkle.scheme = "text-line-v1". To redact a text file you already anchored — revealing some lines, withholding others — the disclosure leaf rule MUST equal the anchor's text-line-v1 leaf rule byte-for-byte. This profile pins that rule.
Granularity is LINE (locked). Sentence-level redaction has no on-chain sentence leaves to prove into and is a planned FUTURE anchor scheme (text-sentence-v1), not this profile. The deprecated satsignal.text.paragraph_sentence.v1 profile cannot bind to a live text anchor (salted/framed leaf + sentence granularity); see §9.
2. Inputs and canonicalization (text-norm-v1)
The leaf-set is computed from the original file bytes via the SAME canon the anchor applies (web/templates.py normalizeTextForCanonical / normalizeText, byte-identical in the standard and sealed anchor branches):
- BOM strip — remove ONE leading
U+FEFFif present. - NFC —
String.prototype.normalize("NFC")over the whole string. - Line endings — replace
\r\nand a lone\rwith\n(regex/\r\n?/g→\n). - Per-line trailing whitespace — strip a trailing run of spaces and tabs (
/[ \t]+$/) from each line. Interior whitespace is preserved.
The decoder is lenient UTF-8 (invalid bytes → U+FFFD), matching the anchor's file.text(). A file the anchor accepted recomputes to the same leaves here; a true mismatch surfaces as the distinct recompute-mismatch failure (§7), never a silent reject.
The content-canonical hash is sha256 of the full canonical string under scheme text-norm-v1; it is not part of the per-leaf rule but is the anchor's content_canonical.
3. Leaf extraction — split on \n, DROP empty lines, NO header
After canonicalization, segment into leaves:
leaves = canonical.split("\n").filter(L => L.length > 0)
- Empty lines are DROPPED. A blank line (or a line that was only trailing whitespace, now stripped to
"") is not a leaf. The trailing""produced by a final\nis likewise dropped. - No header concept. Unlike
csv-row-v1(which excludes row 0),text-line-v1keeps every non-empty line: leaf 0 is the first non-empty line. - Leaf ordering is non-empty-line document order, zero-indexed. A leaf index
iis its position in the filtered non-empty-line list — it is NOT the source file line number (blank lines shift the mapping). leaf_id="l"+ 6-digit zero-padded leaf index (e.g.l000000). Display / ordering handle only — NOT part of any hash preimage. (The"l"prefix differs fromcsv-row-v1's"r"for readability only; it is not load-bearing.)
A file with zero non-empty lines is not a valid text-line-v1 disclosure source (no leaves to prove); the tool fails with invalid_text_empty.
4. Leaf hash — bare sha256 of the canonical line (standard mode)
For a STANDARD text anchor (chunk_merkle.algo == "sha256"):
leaf_hash_i = sha256( utf8( canonical_line_i ) )
Bare — no profile literal, no leaf_id, no salt, no 0x00 separators. A standard text-line-v1 revealed[i] carries {leaf_id, profile: "text-line-v1", value: <canonical line string>, leaf_hash, proof_path} and no salt_b64 (§5).
The verifier's value→bytes rule is utf8(value); it recomputes sha256(utf8(value)) and compares to the published leaf_hash, then walks proof_path to linked_anchor.root.
Worked example (NOT placeholders — computed against the anchor rule)
Source file bytes (BOM + CRLF, a blank line, and a trailing-whitespace line):
"First line of the memo.\r\n" "\r\n"
"Second line has trailing spaces. \r\n" "Third and final line.\r\n"
Canon + segmentation → 3 non-empty-line leaves (the blank line dropped; line 3's trailing spaces stripped):
| leaf_id | value (canonical line) | sha256(utf8(value)) |
|---|---|---|
| l000000 | First line of the memo. | 8187a5534ddc483f4c936f872837a8bf39d2d225f18307c496954cbe7dffe119 |
| l000001 | Second line has trailing spaces. | e3baa7985e2bcdf6cf2d7363cec05456728a2a5d3f98b37542392f24e9a72162 |
| l000002 | Third and final line. | dd52bddca31b1a03b84525208f2d4e7dd2132f4b90621dd674eb3a69c9a2c428 |
These are frozen in tests/vectors/disclosure-v1/text_line_v1_native/N1.fixture.json.
5. Salts — standard mode is UNSALTED (privacy posture is first-class)
Standard text-line-v1 leaves are unsalted bare sha256(utf8(line)). The honest characterization is stronger than "an incidental proof_path sibling leaks":
- The standard
.mbntpublishes EVERY leaf hash, including redacted lines.proofs.jsoncarriesmerkle_leaves= the complete ordered list of every non-empty-line leaf hash, redacted lines included. A holder of the standard bundle has the exactsha256(canonical_line)of each withheld line and can guess-and-confirm it entirely offline — not only when a withheld line happens to sit on a revealed line'sproof_path. - Zero per-leaf entropy ⇒ identical withheld lines have identical leaf hashes. With no salt, no
leaf_id, and no profile tag in the preimage (§4), two redacted lines with the same canonical content produce the same leaf hash, so an observer can tell which redacted lines share a value (cross-equality leak) without recovering it. - A withheld line's recovery cost equals THAT LINE'S OWN entropy. Lines drawn from a small or structured space — a short status line, a boolean or enum, a date, a name from a known list, or a known-format identifier — are trivially recoverable: an adversary confirms a guess by hashing it once against the published leaf hash. Only genuinely high-entropy free-form lines are protected.
This is the anchor-time-chosen tradeoff, not a defect — the discloser accepted it by anchoring in standard mode. Do NOT use standard mode to withhold low-entropy or small-space sensitive lines; route that data to sealed mode (§5b), where redacted lines are unguessable and equal withheld lines do not collide. No keyless scheme can protect a redacted line that is itself the guessable secret; that is the cost of the no-keyfile requirement and the reason sealed exists. Choose sealed before anchoring if any withheld line could be low-entropy.
A standard revealed[i] MUST NOT carry salt_b64. The structural schema treats salt_b64 as optional for text-line-v1; the verifier ignores any stray salt_b64 under the bare-sha256 standard rule.
5b. Sealed mode — HMAC leaf under a per-leaf HKDF salt (algo: "merkle-hmac-sha256")
For a SEALED text anchor (chunk_merkle.algo == "merkle-hmac-sha256", chunk_merkle.salt_version == "salt_v1"), the leaf is keyed:
salt_i = HKDF-SHA256(ikm = master_salt,
salt = "satsignal-sealed-v1/per-leaf",
info = "chunk/" || u32_be(i), L = 32)
leaf_hash_i = HMAC-SHA256(key = salt_i, msg = utf8(canonical_line_i))
This is the same per-leaf HKDF/HMAC derivation the sealed CSV anchor uses — the anchor's sealed merkle assembly is generic across file types (web/templates.py ~6056). Only the leaf hash differs from standard; canonicalization (§2), segmentation (§3), and the merkle (§6) are identical.
A sealed revealed[i] carries salt_b64 = base64(salt_i) — the PER-LEAF salt for that revealed line. salt_b64 is REQUIRED for a sealed leaf; a sealed carrier with a revealed leaf missing salt_b64 fails closed with sealed_leaf_missing_salt.
5b.1 What a sealed disclosure carries — per-leaf salt, NEVER the master
The redact tool reads the 32-byte master salt from the SOURCE .mbnt manifest.json (salt_b64, base64url) and derives the per-leaf salts. The disclosure output carries ONLY the per-leaf salts of the revealed lines. THE MASTER-SALT-STRIP RULE (forever): a disclosure .mbnt MUST NOT contain the master salt in any encoding, and MUST NOT carry a redacted line's per-leaf salt. Shipping the master salt re-derives every per-leaf salt and unseals every redacted line. The tool enforces this structurally (it never ships the source manifest.json) and with a P0 runtime guard (redact-core.mjs:_assertMasterSaltStripped, scheme/mode-independent). Revealing the per-leaf HKDF salts of revealed lines leaks nothing about the master salt or other lines (HKDF-Expand is a PRF).
5b.2 Privacy posture
A sealed redacted line is unguessable: its leaf is an HMAC under a per-leaf salt the verifier cannot derive without the master salt, which the disclosure never carries. Standard = disclosed-lines-only guarantee with brute-forceable redacted lines; sealed = redacted lines stay private. The choice is made at anchor time.
5b.3 Worked example (NOT placeholders)
Same 3-line source as §4; master salt = 0x00 0x01 … 0x1f (the bearer secret, NEVER shipped). Sealed leaves:
| leaf_id | value | HMAC(salt_i, utf8(value)) |
|---|---|---|
| l000000 | First line of the memo. | 625041249f20c24a50eeb4dde7e121520a245e0d7e03b1b7b3b09f8b6f94d48d |
| l000002 | Third and final line. | ac4d39834feb39faa165dd89fd4888a28b6280411f953522ba512cd00f45f415 |
Frozen in tests/vectors/disclosure-v1/text_line_v1_native_sealed/S1.fixture.json.
6. Merkle behavior — DUPLICATE-LAST on odd
The tree is duplicate-last-on-odd, identical to csv-row-v1 and to the anchor (merkleRootFromHexLeaves / merkleRootFromLeafBytes): at each level an odd last node pairs with itself (SHA-256(node || node)). The verifier only walks proof_path — it never rebuilds the root — so the duplicate-last tree verifies with no merkle-walk change. The redact tool emits duplicate-last-correct paths (a self-sibling entry for the odd-promoted node).
Worked example (the §4 three-leaf tree)
Leaves A=l000000, B=l000001, C=l000002 (the §4 hashes).
- Level 0 → 1: pair
A,B→L1[0] = SHA-256(A || B) = c5ec33d952be464863cf30c6cb6eb90ddbba981c6b22ff4e0acc71728408569d.Cis the odd last node → self-pairs →L1[1] = SHA-256(C || C). - ROOT =
SHA-256(L1[0] || L1[1]) = 5e4f6278d3e8f1a8175e8f635e76f828262e48b3f4e5b6ffd326357cc04a607c.
Proof paths (frozen in N1):
- l000000 (A) —
[{R, B}, {R, L1[1]}](B is the level-0 sibling;L1[1]at level 1). - l000002 (C) — the two-entry self-sibling path
[{R, C(itself)}, {L, L1[0]}]: fold C against its own hash to reachL1[1], then againstL1[0]to reach the root. The verifier MUST walk this; it must NOT reject the self-sibling entry or assume promote-unchanged.
7. Original anchor binding
A disclosure binds to the existing anchor via the §4 chain of disclosure-v1.md: the carrier canonical.json (carried VERBATIM) hashes to the on-chain document_hash; its subject.proofs.chunk_merkle.root equals linked_anchor.root; its scheme equals linked_anchor.subject_profile == "text-line-v1"; and its algo selects the leaf rule (sha256 standard / merkle-hmac-sha256 sealed). The redact tool recomputes the leaves from the original file, hard-fails if they do not match the committed merkle_leaves + root (wrong file / wrong bundle / edited file), then builds proof paths for the revealed lines. No re-anchor; no new scheme.
The redacted copy emits the canonical non-empty lines (NFC, trailing-ws stripped, blank lines dropped) — revealed lines as their value, redacted lines as [REDACTED], positions preserved among the leaf-set, \n-joined. This is what is cryptographically attested; presentation.format == "txt", presentation.view_sha256 == sha256(redacted bytes).
8. Fixtures (test vectors)
[FOREVER-CONTRACT] — disclosure-v1.md §11 forbids a profile without vectors. Frozen, oracle-computed + tool-cross-checked:
- Standard:
tests/vectors/disclosure-v1/text_line_v1_native/N1— the §4 happy path (reveal l000000 + l000002, redact l000001; nosalt_b64).N2_linked_anchor_profile_mismatch— carrier schemetext-line-v2, root equal →linked_anchor_profile_mismatch.N3_empty_line_included_mistake— a non-conformant discloser kept a blank line as a leaf (wrong tree); the bare-sha256leaf still matches but the wrong-tree path misses the root →merkle_path_mismatch(pins §3 drop-empties).negatives/overlays —leaf_hash_mismatch,merkle_path_mismatch,linked_anchor_root_mismatch,linked_anchor_canonical_hash_mismatch.
- Sealed:
tests/vectors/disclosure-v1/text_line_v1_native_sealed/S1— the §5b happy path (per-leafsalt_b64; odd-last self-sibling).negatives/—S1_leaf_hash_mismatch,S1_wrong_salt,S1_merkle_path_mismatch,S1_linked_anchor_root_mismatch,S1_linked_anchor_canonical_hash_mismatch,S2_missing_salt(sealed_leaf_missing_salt),S3_wrong_salt_version(unsupported_linked_algo),S4_linked_anchor_profile_mismatch.
9. Out of scope / deprecation pointers
- Sentence/paragraph granularity is NOT this profile. The deprecated salted
satsignal.text.paragraph_sentence.v1(text-paragraph-sentence-v1.md) chunks by sentence with aprofile||0x00||leaf_id||0x00||value||0x00||saltpreimage and cannot bind to a livetext-line-v1anchor. It stays inert (allowlist literals are never removed; its frozen corpus is a regression guard only). A futuretext-sentence-v1ANCHOR scheme + a de-salted native profile is a separate effort. - Binary text containers (PDF text,
.docx) anchor under their own schemes (pdf-page-v1,zip-file-v1) and are out of scope here.
11. Profile registry pointer
Registered in disclosure-v1.md §11. text-line-v1 is the native text-line rule that text anchors actually emit; a disclosure binds to the chunk_merkle the anchor already committed (scheme == "text-line-v1"), revealing a subset of its per-line leaves — no re-anchor, no new scheme. Leaf rule: §§2–4 standard, §5b sealed; merkle §6; binding §7; vectors §8.
Questions about this specification? Email hello@satsignal.cloud.