satsignal.json.field.v1 — JSON-field selective-disclosure profile
DEPRECATED / INERT (read this first). This
satsignal.json.field.v1dotted profile is deprecated and inert. It defined a salted rule with one leaf per nested primitive value addressed by RFC-6901 JSON Pointer (every deep field a leaf) and a salted/framed leaf preimage. No production flow emits or consumes it. Its allowlist literal is retained forever (an allowlist literal is never removed) and its frozen regression corpus is kept solely as a regression-guard record — it is never produced or verified by any live path. The rules below stay frozen for that regression corpus; do not implement them for new work. Live successor:json-keypath-v1. New JSON disclosures use the nativejson-keypath-v1literal, which binds to thechunk_merklea JSON anchor already commits. Note the granularity difference: this deprecated profile hashed every deep RFC-6901 pointer (salted); the livejson-keypath-v1segments by top-level key (key:jcs(value)entry, bare/sealed leaf, native merkle binding — no re-anchor, no salt keyfile in standard mode). The two cannot interbind. Authority for the deprecation:disclosure-v1 §11.
Versioning (2026-05-27). This is satsignal.json.field.v1. The profile literal is fixed at "satsignal.json.field.v1" and is bound into the leaf-hash preimage at anchor time. This literal is a forever-contract: once any client has anchored under it, the segmentation, canonicalization, normalization, salting, and leaf_id construction rules pinned below are fixed for that literal forever. A bug in those rules cannot be patched in place; the only remedy is a new satsignal.json.field.vN+1 profile that compatible verifiers must support in parallel.
Status: draft 1, 2026-05-27. Audience: integrators who anchor a structured JSON document (configs, invoices, API records, structured event payloads) and later want to publish a redacted view revealing a chosen subset of fields with cryptographic proof that those fields are members of the original anchor; verifier authors who must reproduce per-leaf hashes byte-for-byte to render and check such a view. Goal: define exactly one set of rules — input encoding, canonical form, leaf extraction, leaf identifier, leaf-hash preimage, salt strategy — under which a JSON document is segmented into a leaf-set whose merkle root commits to every primitive value in the document, so that a later disclosure can selectively reveal any subset of those values.
1. Why this exists
A great deal of the data the notary is asked to anchor arrives as structured JSON: configuration snapshots, invoices, API request / response records, IoT readings, audit-log events. The anchorer often wants to commit the whole record at time T1, then later publish a partial view of it — a few specific fields — without revealing the other fields and without re-anchoring. Selective disclosure (../disclosure-v1.md) provides the machinery; this profile pins the per-leaf rules that make it work for JSON.
The design strategy:
- One leaf per primitive value. Container nodes (objects, arrays) are not leaves; their primitive descendants are. This matches how an anchorer thinks about a JSON document — "I want to disclose
customer.nameanditems[3].price" — and makes JSON Pointer the naturalleaf_id. - JCS-canonicalize once at the top of the pipeline. Before any leaf extraction or hashing happens, the input is run through RFC 8785 JCS. All downstream rules operate on the canonical form. Key reordering, number formatting, whitespace, and Unicode-equivalence concerns vanish at that boundary; both anchorer and verifier produce the same canonical bytes for any structurally-equivalent input.
- JSON Pointer for
leaf_id. RFC 6901 is the obvious choice — it is the standard way to address a value within a JSON document, has well-defined escaping, and survives canonicalization (the pointer tocustomer.nameis/customer/namewhether the source had keys in any order). - Forbid leaf-less documents. A document that JCS-canonicalizes to an empty container or has no primitive descendants has nothing to commit; the anchorer must add at least one primitive (a sentinel field, e.g.
{"empty": true}) so the leaf-set is non-trivial.
What this profile deliberately does NOT do:
- It does not redact, project, or schema-validate the input. It hashes every primitive value present in the canonical form.
- It does not support selective disclosure of container nodes (objects or arrays as such). Disclosing
/itemsis not a profile operation — the anchorer discloses the primitive leaves underneath (/items/0/sku,/items/0/qty, …). - It does not introduce a semantic type system. Values are primitives: string, number, boolean, null. Distinguishing "ISO date string" from "free-text string" is the anchorer's concern.
- It does not assume a JSON Schema. JSON Schema integration is a future profile, not v1.
2. Inputs and canonicalization
2.1 Encoding
Decision (forever): UTF-8 mandatory. Inputs that are not valid UTF-8 fail closed at preprocessing with invalid_utf8_input before any parsing is attempted. Rationale: JSON's interchange encoding is UTF-8 per RFC 8259 §8.1; admitting other encodings would force every verifier to ship multi- codec recovery and would break canonical-bytes reproducibility.
2.2 BOM handling
Decision (forever): A single leading UTF-8 byte-order mark (EF BB BF) at byte offset 0 is stripped before parsing. BOMs that appear elsewhere in the byte stream are NOT stripped — they live inside string values as the Unicode character U+FEFF and are part of the value. Rationale: RFC 8259 §8.1 forbids a leading BOM but real-world tooling emits one anyway; tolerating exactly the leading case is the documented industry workaround. Treating an interior U+FEFF as a stripper would silently corrupt string content.
2.3 JSON validity
Decision (forever): Input MUST be valid JSON per RFC 8259. The following extensions are NOT permitted and fail closed with invalid_json_input:
- comments (
// …or/* … */) - trailing commas (
[1, 2, 3,]) - single-quote strings (
'hello') - unquoted object keys (
{foo: 1}) - bare control characters inside strings
NaN,+Infinity,-Infinityas number tokens
Rationale: the canonicalization step is defined only over strict RFC 8259 inputs; admitting JSON5 / JSONC variants opens an unbounded set of edge cases that would be forever-contracted into the profile.
2.4 Canonical form
Decision (forever): the input is run through RFC 8785 JCS before any further processing. JCS pins, in summary:
- object keys sorted by UTF-16 code-unit codepoint order
- no insignificant whitespace inside objects or arrays
- numbers in the shortest unambiguous form that round-trips through IEEE 754 double (per ECMA-262
Number.prototype.toString) - strings re-encoded with minimal escaping (only
",\,\b,\t,\n,\f,\r, andU+0000–U+001Fcontrols are escaped; non-ASCII code points are emitted as their UTF-8 bytes) - UTF-8 output bytes
A verifier MUST reproduce JCS output byte-for-byte; deviation breaks the leaf-hash preimage. Rationale: JCS is the only widely-deployed spec that pins JSON canonical bytes; reinventing canonicalization inside this profile would be a forever-contract on every edge case JCS already settled.
Forever-contract — use a real RFC 8785 implementation. Canonical form is RFC 8785 JCS, not an approximation of it. A profile-conformant anchorer or verifier MUST use a real RFC 8785 implementation (e.g. the Python jcs PyPI module ≥ 0.2.1, or any other library whose test suite passes the RFC 8785 reference vectors). Hand-rolled approximations of JCS — typically "sorted keys
- minimal whitespace + Python
reprfor floats" — are not interchangeable with RFC 8785 on two specific kinds of input:
- Supplementary-plane code points (U+10000–U+10FFFF). RFC 8785 §3.2.3 sorts object keys by UTF-16 code-unit order, not Unicode codepoint order. The two orderings diverge for keys that contain astral characters: e.g. U+1F389 (
🎉) encodes as the UTF-16 surrogate pairD83C DF89, so under UTF-16 code-unit sort it sorts AFTER any BMP key whose first code unit is below0xD83C(including all of U+E000–U+D83B) but BEFORE BMP keys in0xD83C–0xFFFF. A naive Pythonsorted(keys)puts the same key strictly AFTER every BMP code point. Pairs of keys exist where the two orderings produce different output bytes. - Floating-point values. RFC 8785 §3.2.2.3 mandates the ECMAScript 6.0
Number.prototype.toStringalgorithm — the "shortest decimal that round-trips through IEEE 754 binary64." Python'srepris a close approximation but is not byte-for-byte identical to ECMAScript's output for every input; for trust applications the only safe choice is a library that tracks the RFC.
This is a forever-contract: any future implementation that wants to verify a satsignal.json.field.v1 anchor MUST use real RFC 8785 canonicalization. See §11 ("Why not a hand-rolled JCS") for the historical context behind this constraint, and the worked examples B11–B13 (§8) for inputs that specifically exercise the divergent edges.
2.5 Top-level type
Decision (forever): the top-level value MUST be a JSON object or a JSON array. Top-level primitives — a bare string, number, boolean, or null at the root — fail closed with invalid_top_level_type. Rationale: a top-level primitive has no leaf-set in a meaningful sense — there is one trivially-addressed value at the root (JSON Pointer "") and no point in segmenting it. Anchorers who want to commit a single primitive value can wrap it: {"value": "..."}.
2.6 Number canonicalization
Decision (forever): per JCS. Integer-valued numbers serialize as bare decimals without a trailing decimal point. Floats serialize as their shortest round-trip representation. Exponent forms collapse to integer form when the value is an integer in IEEE 754 double. Examples (canonical form on the right):
| Input | JCS canonical form |
|---|---|
1e10 | 10000000000 |
10000000000 | 10000000000 |
10000000000.0 | 10000000000 |
1.0 | 1 |
3.14 | 3.14 |
-0 | 0 |
1.5e-2 | 0.015 |
Worked example: {"x": 1e10} and {"x": 10000000000} and {"x": 10000000000.0} all canonicalize to the byte sequence {"x":10000000000} and produce identical leaves. See fixture B6.
NaN, +Infinity, -Infinity are not legal JSON numbers and are rejected per §2.3.
3. Leaf extraction
3.1 What is a leaf
Decision (forever): every primitive value (string, number, true, false, null) reachable from the root via JCS-canonical traversal is a leaf. Objects and arrays are NOT leaves — they are container nodes whose primitive descendants ARE the leaves. Rationale: the disclosure model assumes the anchorer wants to reveal individual scalar facts ("the customer's name", "the price of item 3"). Treating a container as a leaf would either commit to its full JCS bytes (defeating per-field disclosure) or require a separate sub-tree commitment (a different leaf-hash preimage — not v1).
Example. The document
{"a": {"b": 1, "c": "x"}, "d": [10, 20]}
produces four leaves at JSON Pointer paths /a/b, /a/c, /d/0, /d/1. The objects /a, the document root, and the array /d are NOT leaves; they are containers walked to find the primitives.
3.2 leaf_id construction
Decision (forever): the leaf_id is the RFC 6901 JSON Pointer that addresses the leaf in the canonical form. Standard RFC 6901 escaping applies to object keys:
~→~0/→~1
(Escaping order matters: replace ~ first, then /; this is the RFC 6901 contract and the rule any pointer-aware library implements.)
Array elements address by zero-based decimal index with no leading zeros. Rationale: JSON Pointer is the standard cross-implementation way to address a value inside a JSON document; it has stable semantics, well-defined escaping, and avoids reinventing a pointer syntax inside this profile.
Examples:
| Position in document | leaf_id |
|---|---|
object key name at root | /name |
object key name inside customer | /customer/name |
array index 3 of root key items | /items/3 |
key sku inside element 0 of root items | /items/0/sku |
key with / in its name: {"a/b": ...} | /a~1b |
key with ~ in its name: {"c~d": ...} | /c~0d |
Unicode key: {"日本語": ...} | /日本語 (raw UTF-8 bytes) |
3.3 Leaf ordering
Decision (forever): depth-first, JCS-canonical-key order. The anchorer walks the JCS-canonicalized document:
- At each object node, visit keys in the JCS-canonical order (codepoint sort over UTF-16 code units — identical to JCS object- key ordering).
- At each array node, visit elements in increasing index order (0, 1, 2, …).
- Recurse depth-first; emit a leaf record when reaching a primitive value.
Leaf 0 is the first primitive emitted by this walk; leaf N-1 is the last. This ordering defines the leaf-set's positions in the merkle tree (see ../disclosure-v1.md §3.4 invariant 4). Rationale: a fixed total order is required for the merkle tree to have a single root; depth-first JCS-canonical-key order is the unique walk that an anchorer and a verifier can both reproduce from the canonical bytes alone, with no auxiliary table.
3.4 Empty containers
Decision (forever): an empty object {} and an empty array [] contribute zero leaves. They are not themselves leaves and have no primitive descendants; they neither appear in the leaf-set nor increment the leaf count. Rationale: an empty container has no committable value. Anchorers who need to commit "this field is intentionally empty" can substitute a sentinel primitive, e.g. null (which DOES produce a leaf with value_canonical_bytes = null) instead of {} or [].
3.5 Documents with zero leaves
Decision (forever): forbidden. A canonical input that produces zero leaves — top-level {}, top-level [], or any nested structure with no primitive descendants — fails closed with empty_leaf_set. Rationale: a leaf-set of size zero has no merkle root to commit; the anchor would have nothing to bind the disclosure into.
4. Leaf-hash preimage
4.1 Byte layout
Decision (forever): the leaf hash is computed over the following byte sequence. The layout is shared with the other v1 profiles in this set — same separator byte, same field order, same encoding rules — so a cross-profile reader implements one preimage builder and only the per-field rules vary.
leaf_hash = sha256(
profile_literal_utf8 // "satsignal.json.field.v1" as UTF-8 bytes (23 bytes)
|| 0x00 // separator
|| leaf_id_utf8 // JSON Pointer as UTF-8 bytes (e.g. "/items/0/sku")
|| 0x00 // separator
|| value_canonical_bytes // see §4.2
|| 0x00 // separator
|| salt_bytes // raw bytes from base64-decoding salt_b64 (16 bytes)
)
The three 0x00 separator bytes are pinned forever for v1 and are identical across the v1 profile set. Rationale: the separators prevent boundary smuggling — an attacker who can choose leaf_id and value should not be able to craft a value-suffix that, concatenated with the next field's prefix, produces the same preimage as a different (leaf_id, value) pair. Without separators, leaf_id="/a" + value="b/c" and leaf_id="/a/b" + value="c" would produce identical preimage bytes; with 0x00 separators they differ because the JSON-canonical encoding of a string value cannot itself contain 0x00 (control characters MUST be escaped as \uXXXX per §2.4) and a JSON Pointer cannot contain 0x00 either (it would be invalid UTF-8 if inserted raw, and ~-escaping does not produce nulls).
4.2 value_canonical_bytes
Decision (forever): the leaf's primitive value, serialized in JCS form (§2.4) as UTF-8 bytes. The JSON-token form IS the canonical value — the surrounding quotes for strings, the lowercase true/false/null literals, the canonical number form. The verifier does NOT unwrap string quotes before hashing.
Examples (all in UTF-8 bytes):
| Primitive value | value_canonical_bytes (text) | Length |
|---|---|---|
string "hello" | "hello" | 7 |
string "" | "" | 2 |
string "x" | "x" | 3 |
number 42 | 42 | 2 |
number 0 | 0 | 1 |
number -1 | -1 | 2 |
number 3.14 | 3.14 | 4 |
boolean true | true | 4 |
boolean false | false | 5 |
null | null | 4 |
string "42" | "42" | 4 |
string "he said \"hi\"" | "he said \"hi\"" (JCS-escaped form) | 17 |
Rationale: keeping the JSON-token form prevents two distinct primitives from colliding under hash. Without quotes, the string "42" and the number 42 would produce identical value_canonical_bytes (both 42) and identical leaf_hash — breaking the property that distinct leaves produce distinct commitments. The JSON-token form preserves the type tag.
4.3 Worked example
Take fixture B1 (§9): {"name": "Alice", "age": 42}. The leaf at /age with the published salt produces this preimage and hash:
profile_literal_utf8=73 61 74 73 69 67 6e 61 6c 2e 6a 73 6f 6e 2e 66 69 65 6c 64 2e 76 31(23 bytes, ASCIIsatsignal.json.field.v1)- separator =
00 leaf_id_utf8=2f 61 67 65(4 bytes, ASCII/age)- separator =
00 value_canonical_bytes=34 32(2 bytes, ASCII42)- separator =
00 salt_bytes= base64-decode(/+N9Hd4a6Sp3vMDBX4kBvQ==) =ff e3 7d 1d de 1a e9 2a 77 bc c0 c1 5f 89 01 bd(16 bytes)
Full 48-byte preimage:
7361747369676e616c2e6a736f6e2e6669656c642e7631002f61676500343200ffe37d1dde1ae92a77bcc0c15f8901bd
leaf_hash = sha256(preimage) = 533e320213e48d42a8a9472c8ad12739a576ab172fd126b632ad4f27a79ae687.
5. Salts
5.1 Salt size
Decision (forever): 16 raw bytes per leaf, encoded in salt_b64 as standard base64 (RFC 4648 §4) with = padding. Rationale: 16 bytes (128 bits) is sufficient entropy to defeat brute-force preimage search against the leaf hash for low-entropy field values (a guessed candidate value cannot be tested against a target leaf_hash without also obtaining the salt). Going larger costs storage in proofs.json with no cryptographic gain at the sha256 output size.
5.2 Salt uniqueness
Decision (forever): each leaf MUST have a unique, independently- generated salt sourced from a cryptographically-secure pseudorandom number generator (CSPRNG). Salt reuse across leaves is forbidden — even within the same document. Rationale: reusing a salt would let an attacker who learns one revealed leaf's preimage relate its hash structure to other leaves' hashes; per-leaf salts make each leaf's preimage independent.
5.3 Salt persistence
Decision (forever): the anchorer persists the full salt-set client-side in the bundle's proofs.json (off-chain), keyed by leaf_id. The salt-set is NOT committed on-chain and NOT included in the canonical doc. Rationale: salts are part of the off-chain material an anchorer must retain to later produce a disclosure; losing them irrecoverably forfeits the ability to disclose. The on-chain commit binds the merkle root, which depends on the salts; the salts themselves are private until the anchorer chooses to reveal them as part of a disclosure record.
6. Merkle behavior (cross-reference)
This profile defers all merkle-tree construction and proof invariants to ../disclosure-v1.md §3.4: hash algorithm pinning, single-leaf-tree behavior, odd-node promote- unchanged rule, leaf ordering source, and the per-step (sibling || frontier) vs (frontier || sibling) direction encoding. Per that section, raw 32-byte concatenation is used at every step; ASCII-hex concatenation is explicitly forbidden.
The only profile-defined input to that machinery is the leaf-set: its element count (every primitive value in the canonical form, per §3) and its total order (depth-first JCS-canonical-key order, per §3.3).
7. Original anchor binding
A document anchored under this profile commits to its leaf-set through a chunk_merkle proof in the original .mbnt's canonical doc. The bound fields:
| Field | Value under this profile |
|---|---|
subject.proofs.chunk_merkle.scheme | the literal string "satsignal.json.field.v1" |
subject.proofs.chunk_merkle.algo | the literal string "sha256" (v1 is standard-mode only) |
subject.proofs.chunk_merkle.leaf_count | total count of primitives in the JCS canonical form |
subject.proofs.chunk_merkle.root | merkle root over leaves in depth-first JCS-canonical order |
A future sealed-mode variant of this profile would carry algo: "merkle-hmac-sha256" and a salt_version; sealed-mode JSON- field disclosure is out of scope for v1 (../disclosure-v1.md §4 step 5 fails closed on algo != "sha256").
A disclosure record bound to this profile carries linked_anchor.subject_profile == "satsignal.json.field.v1" and each revealed[i].profile == "satsignal.json.field.v1"; the disclosure verifier walks the binding chain in ../disclosure-v1.md §4 and the per-leaf recomputation in ../disclosure-v1.md §7 step 4.
8. Fixtures (test vectors)
The fixtures below are reproducible from the rules in §2–§5. All sha256 digests are computed (not placeholder); a verifier implementing this profile MUST reproduce them byte-for-byte. Salts in the fixtures are pinned deterministically — salt_b64 = base64(sha256("json-field-v1|" + leaf_id + "|" + idx)[:16]) — so the fixtures' hashes can be reproduced from public bytes alone. Production anchorers MUST source salts from a CSPRNG, not from a fixed derivation; the fixture salts exist purely to make the test vectors self-checkable.
B1 — minimal
Input:
{"name": "Alice", "age": 42}
JCS canonical bytes (length 25):
{"age":42,"name":"Alice"}
leaf_count = 2. Leaf records:
| i | leaf_id | value_canonical (text) | salt_b64 | leaf_hash |
|---|---|---|---|---|
| 0 | /age | 42 | /+N9Hd4a6Sp3vMDBX4kBvQ== | 533e320213e48d42a8a9472c8ad12739a576ab172fd126b632ad4f27a79ae687 |
| 1 | /name | "Alice" | Aoki/skytzCabG+nnIwSaA== | e5cb099fc0fe04f443c0ff86879162159cfed1fa92c80862a86021d391f9563c |
Full merkle tree:
L0: [533e320213e48d42a8a9472c8ad12739a576ab172fd126b632ad4f27a79ae687,
e5cb099fc0fe04f443c0ff86879162159cfed1fa92c80862a86021d391f9563c]
L1: [c1f5e68c87dcbf89ebd99b0967a34e81fb730b70569733c295f9a4769132e17c]
root = c1f5e68c87dcbf89ebd99b0967a34e81fb730b70569733c295f9a4769132e17c.
The per-byte preimage breakdown for leaf 0 is §4.3.
B2 — nested
Input:
{"customer": {"name": "Alice"}, "items": [{"sku": "x"}, {"sku": "y"}]}
JCS canonical bytes (length 63):
{"customer":{"name":"Alice"},"items":[{"sku":"x"},{"sku":"y"}]}
leaf_count = 3. Leaf records (depth-first, JCS-canonical-key order — customer sorts before items):
| i | leaf_id | value_canonical | salt_b64 | leaf_hash |
|---|---|---|---|---|
| 0 | /customer/name | "Alice" | y3NcBZqzS4zaRIIdZRW46A== | e27f140c6ce4897056e72d3990e28282dbc444564f6967f35c7b42e24e2dd800 |
| 1 | /items/0/sku | "x" | YxpPackcqe2LJXvDbgCfBQ== | 7510298f34bd4ca1a744238dc4c72c19d027494d0d958bed3d9b7bdcb76f824d |
| 2 | /items/1/sku | "y" | 5e0+PYeV10DXCyVzc8yntQ== | 6fa371b72e1bbd0e4a9bb0faba40d77179bd662a6cd47d9f9910dcb4ee58a309 |
B3 — array of primitives
Input:
{"tags": ["red", "green", "blue"]}
JCS canonical bytes (length 31):
{"tags":["red","green","blue"]}
leaf_count = 3. Leaves at index order:
| i | leaf_id | value_canonical | salt_b64 | leaf_hash |
|---|---|---|---|---|
| 0 | /tags/0 | "red" | 2L3qHOs6n5ZKmWAuJPyLCA== | 05a106a0b911d29b1a074a7348cc6a35808fdc3d8f09bcf6aa69950f823acd17 |
| 1 | /tags/1 | "green" | /3GljQQXj1WcZs5sHPlE3g== | b60e1b4a97863ada9d603201a44cb12ea32b777b15649196d3bd8f02eaf18837 |
| 2 | /tags/2 | "blue" | vTtfu+WKxLOek5DVN1W5kA== | d130a05a39bcde12b74aa9ff199b407f0bef526e7ebb58c8633861deb308f1d2 |
B4 — escaping in key
Input:
{"a/b": 1, "c~d": 2}
JCS canonical bytes (length 17):
{"a/b":1,"c~d":2}
leaf_count = 2. Note the RFC 6901 escaping in the leaf_id:
| i | leaf_id | value_canonical | salt_b64 | leaf_hash |
|---|---|---|---|---|
| 0 | /a~1b | 1 | 2PuUnKcgEGlYFi9yA4o+Sg== | 8c11a0c77968fcd807d7bb479351f126019d08f009a73fd0bae748175f138803 |
| 1 | /c~0d | 2 | g2QACzqIeCl9r0pyDbBaqA== | 3d40ab8c4502d035e7088ccfb2bfd8720089b052fa32c525015e53dc2f951249 |
The / in the key a/b is escaped to ~1; the ~ in c~d is escaped to ~0. The canonical JCS form preserves the raw key bytes verbatim (no pointer escaping) — escaping applies only to the leaf_id derived from the key.
B5 — key-order independence
Both inputs canonicalize to identical bytes and (under identical salts) produce identical leaves and identical merkle roots.
Input A:
{"b": 1, "a": 2}
Input B:
{"a": 2, "b": 1}
JCS canonical bytes (length 13) for both inputs:
{"a":2,"b":1}
leaf_count = 2. Leaf records (identical for A and B):
| i | leaf_id | value_canonical | salt_b64 | leaf_hash |
|---|---|---|---|---|
| 0 | /a | 2 | s09hGGBqe743ijuoZtXX5Q== | 0fc62b52da5aa247be1ccc875a80f766b583311e46b8757f1457e4921e8fbd2c |
| 1 | /b | 1 | sJS1rWYeell4nZY/eczBeQ== | a3f063d1c47b3bb40ec0fd59c0d64c1d4d179b9612a54618b17290a2543d9d2e |
Full merkle tree:
L0: [0fc62b52da5aa247be1ccc875a80f766b583311e46b8757f1457e4921e8fbd2c,
a3f063d1c47b3bb40ec0fd59c0d64c1d4d179b9612a54618b17290a2543d9d2e]
L1: [fbe6a68a98d8f790c17910d7928bb059749566601e0a659a001742769425c2a9]
root = fbe6a68a98d8f790c17910d7928bb059749566601e0a659a001742769425c2a9 for both input A and input B.
B6 — number canonicalization
Both inputs canonicalize to identical bytes and produce an identical leaf.
Input A:
{"x": 1e10}
Input B:
{"x": 10000000000}
JCS canonical bytes (length 17) for both inputs:
{"x":10000000000}
leaf_count = 1. Leaf record (identical for A and B):
| i | leaf_id | value_canonical | salt_b64 | leaf_hash |
|---|---|---|---|---|
| 0 | /x | 10000000000 | x6FsYO5SXlSLgJclepIGrQ== | 8c1c6fc64c987c97126d6f89a673fcfd7ae8021dff940b56658cd9b36254a49f |
Because this is a single-leaf tree, the merkle root equals the leaf_hash directly (per ../disclosure-v1.md §3.4 invariant 2): root = 8c1c6fc64c987c97126d6f89a673fcfd7ae8021dff940b56658cd9b36254a49f, with proof_path = [] in any disclosure.
B7 — deeply nested (5 levels)
Input:
{"l1": {"l2": {"l3": {"l4a": {"l5x": "a", "l5y": "b"},
"l4b": {"l5z": "c"}}},
"l2_other": "shallow"}}
JCS canonical bytes (length 89):
{"l1":{"l2":{"l3":{"l4a":{"l5x":"a","l5y":"b"},"l4b":{"l5z":"c"}}},"l2_other":"shallow"}}
leaf_count = 4. Depth-first order: at root key l1, descend into l2 (sorts before l2_other); at l2 descend into l3; at l3 descend into l4a (sorts before l4b); inside l4a emit l5x then l5y; back up to l4b and emit l5z; back up to root key l2_other and emit. The sequence:
| i | leaf_id | value_canonical | salt_b64 | leaf_hash |
|---|---|---|---|---|
| 0 | /l1/l2/l3/l4a/l5x | "a" | 80KcYHXB/oHIaQTBShRJHw== | 13f4a8b217b1b6cf24e5b0d81e37071df1ce79f677a5d2a69cdda4c84b29044e |
| 1 | /l1/l2/l3/l4a/l5y | "b" | BdIpHXr8hwhTfuA0QH4/QA== | 91e4ba4fd003068817926556e3b637433881a4d4a124247ea55860344e1299d3 |
| 2 | /l1/l2/l3/l4b/l5z | "c" | r8NpIC2z1pJuc76tNwUBWg== | afdfaf21de5c84a470ed6620c417f0b855e0f23e9fc84d340739b003337df797 |
| 3 | /l1/l2_other | "shallow" | Zbhw7x3XGxWp18S/wmwBMw== | 1f843cf03d4819990c2de782e3f7d5be3927ed2bab4b07c8cf86e5af108fca0e |
This fixture exercises both the depth-first descent (the three deep-leaves emit before the shallow l2_other) and the JCS key-sort at each level (l2 before l2_other; l4a before l4b).
B8 — empty container ignored
Input:
{"a": {}, "b": [1, 2]}
JCS canonical bytes (length 18):
{"a":{},"b":[1,2]}
leaf_count = 2 (the empty object at /a contributes no leaves):
| i | leaf_id | value_canonical | salt_b64 | leaf_hash |
|---|---|---|---|---|
| 0 | /b/0 | 1 | OCp+U2WXiMn0e2gFrOe/iw== | dddc9b30277d36cb570a6753524066c79f292d0a409c3cdde1c04db189dc9ecf |
| 1 | /b/1 | 2 | sTl41+6TsYZDqLK6uBjt/w== | 9adf0476749d5f96335aacfedf3a0d92902dc518b4f4ce7ea3d0860659010e51 |
B9 — all four primitive types
Input:
{"s": "x", "n": 42, "b": true, "z": null}
JCS canonical bytes (length 34):
{"b":true,"n":42,"s":"x","z":null}
leaf_count = 4. JCS sorts keys: b, n, s, z.
| i | leaf_id | value_canonical | salt_b64 | leaf_hash |
|---|---|---|---|---|
| 0 | /b | true | 3uAJOOs4NDoURTt8IpEHUg== | 5f3493e5b8337640ad2b5a1db799a776a80c263bd54d2eef8d0fdc1072afd5fb |
| 1 | /n | 42 | gw566yLkzEmBS6VbGgLbHA== | ffd31facf8b5f0b481c18a861efd0116d6be68900205177f17527e9d17cf300a |
| 2 | /s | "x" | U+OG2AshqriwJ6vWssu0Aw== | e09e427a6396ade8cd34158196c2e7f782ae9b55db6f8def0453d73d5149994f |
| 3 | /z | null | bZZRVWIZmVu/4i/dxW+MBw== | ddcd08f44874a9db2f639efabe4b6ea129f1dd93843f327d6640a308fd5a9e70 |
Note the distinct value_canonical forms — true, 42, "x", null — each carry their JSON-token form (with quotes for strings; lowercase for boolean and null). The type tag is preserved in the hash.
B10 — Unicode key
Input:
{"日本語": "value"}
JCS canonical bytes (length 21 — {, ", 9 bytes of UTF-8 for 日本語, ", :, ", value, ", }):
{"日本語":"value"}
leaf_count = 1. The leaf_id carries the raw UTF-8 bytes of the key (no Punycode, no percent-encoding):
| i | leaf_id | value_canonical | salt_b64 | leaf_hash |
|---|---|---|---|---|
| 0 | /日本語 | "value" | tO3WWZRnu/GtAhbPcxwXRQ== | e9cb79b797d5b63386a673e9437f3e04a2b6efee5909dbf2f955a920949a539e |
A verifier implementing JSON Pointer per RFC 6901 emits the key's UTF-8 bytes verbatim into the pointer; the leaf-hash preimage byte sequence at the leaf_id_utf8 slot is therefore 2f e6 97 a5 e6 9c ac e8 aa 9e (/ followed by the nine UTF-8 bytes of 日本語).
B11 — supplementary-plane (astral) key
This fixture exercises the astral / UTF-16 surrogate pair edge of RFC 8785 §3.2.3 key sorting. The key 🎉 (U+1F389, PARTY POPPER) is a supplementary-plane code point; its UTF-8 encoding is four bytes (F0 9F 8E 89) and its UTF-16 encoding is the surrogate pair (D83C DF89).
Salt scheme (pinned). B11's leaves use deterministic salts bytes([0x10 + i]) * 16 for leaf index i, encoded base64. This makes the fixture independently reproducible; production anchorers MUST use a CSPRNG per §5.
Input:
{"🎉": "party", "a": 1}
JCS canonical bytes (length 22, hex 7b2261223a312c22f09f8e89223a227061727479227d):
{"a":1,"🎉":"party"}
(The astral character is emitted verbatim as UTF-8 bytes, NOT as a 🎉 surrogate-pair escape — RFC 8785 §3.2.2.2 escapes only the JSON-required control characters and ASCII "/\; all other code points pass through as their UTF-8 bytes.)
UTF-16 sort observation (pin): under RFC 8785's UTF-16 code-unit order, "a" (0061) sorts before "🎉" (D83C DF89) because 0x0061 < 0xD83C. Python's naive codepoint sort agrees for THIS pair (ord("a")=0x61 < ord("🎉")=0x1F389), so the two orderings coincide here — B11 demonstrates that JCS correctly handles an astral key in the leaf-ID, but it does not by itself construct a two-key set where UTF-16 sort diverges from codepoint sort. A diverging set requires one BMP key in the range U+D83C – U+FFFF (higher than the astral's first surrogate under codepoint sort but sorted strictly AFTER the astral under UTF-16 because its first UTF-16 code unit exceeds D83C); we omit such a fixture here because the BMP range starting at U+D800 is the surrogate range itself and not a legal standalone JSON string code point. A real diverging pair uses three or more keys; we punt on it for v1 and instead pin the rule prose in §2.4 — implementers MUST use a real RFC 8785 library.
leaf_count = 2. Leaf records (depth-first, JCS-canonical-key order — "a" sorts before "🎉"):
| i | leaf_id | leaf_id (hex) | value_canonical | salt_b64 | leaf_hash |
|---|---|---|---|---|---|
| 0 | /a | 2f61 | 1 | EBAQEBAQEBAQEBAQEBAQEA== | f45379b15faae8f901456be33951bf7d3197e2de19f8f12e0914a9dfa20650db |
| 1 | /🎉 | 2f f0 9f 8e 89 | "party" | EREREREREREREREREREREQ== | 53461806857e1c8d1ccbdcc122b14b393204ad151608378f3caa44c712eb301e |
Full merkle tree:
L0: [f45379b15faae8f901456be33951bf7d3197e2de19f8f12e0914a9dfa20650db,
53461806857e1c8d1ccbdcc122b14b393204ad151608378f3caa44c712eb301e]
L1: [75b6ca1a4ed11e957b93a79e59a2c151230f3995e6cef4b95f69036b2c77dc6a]
root = 75b6ca1a4ed11e957b93a79e59a2c151230f3995e6cef4b95f69036b2c77dc6a.
B12 — supplementary-plane string value
This fixture exercises the astral character in a string value edge of RFC 8785 §3.2.2.2. Per JCS, only the control characters and ASCII " / \ are escaped; all other code points (including supplementary-plane) are emitted as their UTF-8 bytes verbatim. A JCS implementation that emits 🎉 surrogate-pair escapes for the astral character would produce different canonical bytes and is non-conformant.
Salt scheme (pinned). bytes([0x20]) * 16 for the single leaf.
Input:
{"emoji": "Hello 🎉 World"}
JCS canonical bytes (length 28, hex 7b22656d6f6a69223a2248656c6c6f20f09f8e8920576f726c64227d):
{"emoji":"Hello 🎉 World"}
leaf_count = 1. Leaf record:
| i | leaf_id | value_canonical (text) | value_canonical (hex) | salt_b64 | leaf_hash |
|---|---|---|---|---|---|
| 0 | /emoji | "Hello 🎉 World" | 2248656c6c6f20f09f8e8920576f726c6422 | ICAgICAgICAgICAgICAgIA== | f3b3e429dad8919ddcf649285f2ce201a4bb521fc9c2fbfb0e655907fa492235 |
Single-leaf tree, so root = f3b3e429dad8919ddcf649285f2ce201a4bb521fc9c2fbfb0e655907fa492235 (per ../disclosure-v1.md §3.4 invariant 2).
B13 — float requiring shortest round-trip
This fixture exercises the floating-point shortest-round-trip edge of RFC 8785 §3.2.2.3. The sum 0.1 + 0.2 in IEEE 754 binary64 is the rational number with shortest decimal representation 0.30000000000000004; that representation — exactly 17 significant digits — is what RFC 8785 (and ECMA-262 §7.1.12 / Number.prototype.toString) prescribes. Both Python repr and a real RFC 8785 library agree on this particular value, but the fixture pins the canonical bytes so implementers can verify their library matches.
Salt scheme (pinned). bytes([0x30 + i]) * 16 for leaf index i.
Input (the sum field is literally the IEEE 754 result of 0.1 + 0.2):
{"x": 0.1, "y": 0.2, "sum": 0.30000000000000004}
JCS canonical bytes (length 43):
{"sum":0.30000000000000004,"x":0.1,"y":0.2}
Float canonicalization observation (pin): jcs.canonicalize produces 0.1 for the value 0.1, 0.2 for 0.2, and 0.30000000000000004 for 0.1 + 0.2. These are the unique shortest decimal representations that round-trip through IEEE 754 binary64, matching RFC 8785 §3.2.2.3 and ECMA-262 §7.1.12. Implementations whose float canonicalization produces (for example) 3.0000000000000004e-1 or 0.30000000000000005 for 0.1 + 0.2 are non-conformant and will produce a different merkle root.
leaf_count = 3. Leaf records (depth-first, JCS-canonical-key order — sum, x, y sort alphabetically under UTF-16 code-unit sort):
| i | leaf_id | value_canonical | salt_b64 | leaf_hash |
|---|---|---|---|---|
| 0 | /sum | 0.30000000000000004 | MDAwMDAwMDAwMDAwMDAwMA== | b267dfb1b255784965e3d79dbf0f3b49b04e7811bb5239b3a7f08c81242297e0 |
| 1 | /x | 0.1 | MTExMTExMTExMTExMTExMQ== | 2d4e9189a942dce000a7c582e549e96ccc2e9615ef91388d62fd8750b8a4d2e2 |
| 2 | /y | 0.2 | MjIyMjIyMjIyMjIyMjIyMg== | 62971f509b214dd62dd201a555b62e273a13e91380cb193bab8bed464b0f08a4 |
Full merkle tree (odd-node promote-unchanged rule from ../disclosure-v1.md §3.4 invariant 3):
L0: [b267dfb1b255784965e3d79dbf0f3b49b04e7811bb5239b3a7f08c81242297e0,
2d4e9189a942dce000a7c582e549e96ccc2e9615ef91388d62fd8750b8a4d2e2,
62971f509b214dd62dd201a555b62e273a13e91380cb193bab8bed464b0f08a4]
L1: [2c931341ad58f50038a2a671e2a46c68bdb3be60cdc1f5f276878129fe6b50ed,
62971f509b214dd62dd201a555b62e273a13e91380cb193bab8bed464b0f08a4]
L2: [1bdafd4b48bf230503eca1f44ea781b15258fe985331998ab203114791207764]
root = 1bdafd4b48bf230503eca1f44ea781b15258fe985331998ab203114791207764.
9. Out of scope for v1
The following are explicitly not addressed by this profile and are not implied by satsignal.json.field.v1:
- Selective disclosure of container nodes. Revealing
/customeras a whole or/itemsas a whole is not a profile operation; an anchorer who wants that semantic must disclose the primitive descendants individually. A futuresatsignal.json.subtree.v1(or similar) would be needed to commit container nodes as leaves; it would carry a different profile literal and its own forever- contract. - Schema-aware projection. This profile knows nothing about JSON Schema, OpenAPI shapes, or type constraints. A leaf at
/customer/ageis a primitive regardless of whether the schema declares itintegerorstring. Schema-aware disclosure is a future profile. - Semantic typing. A string
"2026-05-27"and a string"hello"are equal kinds under this profile. Distinguishing "ISO 8601 date" from "free-form text" is the anchorer's concern; a semantic-typed profile (e.g. one that normalizes date strings to a canonical form before hashing) is a separate forever-contract. - JSON Schema integration. No
$refresolution, no$idbinding, no schema-derived field ordering. The leaf order is defined by JCS canonical key sort, full stop. - JSON Patch / JSON Merge Patch use cases. Disclosure under this profile reveals the values of leaves committed at anchor time; it does not express deltas, edits, or patches against the original. A future profile could carry patch semantics; it would not be v1.
- Sealed mode. v1 of this profile is
algo: sha256only. Sealed variants (HMAC-keyed leaves, salt-version dispatch) are deferred to a future profile minor or a per-profile sealed addendum; see../disclosure-v1.md §4step 5. - Email JSON payloads. Anchoring a JSON dump of a parsed email is technically possible under this profile, but
satsignal.email.*profiles (which would pin MIME / multipart / quoted-reply rules) are explicitly out of scope.
10. Implementation note — why not a hand-rolled JCS
An earlier draft of this spec carried a hand-rolled JCS approximation as a reference implementation: sorted(dict.items(), key=lambda kv: kv[0]) for object keys, minimal-whitespace separators, and Python repr for non-integer floats. Two specific divergences from RFC 8785 were noted before that draft shipped:
- Object-key sort. Python's
sortedis a codepoint sort (compares Unicode scalar values directly). RFC 8785 §3.2.3 mandates a UTF-16 code-unit sort. These diverge for supplementary-plane keys (U+10000 – U+10FFFF) — under UTF-16 the first code unit of any astral character is a high surrogate (D800–DBFF), which sorts strictly AFTER any BMP code point below0xD800but strictly BEFORE BMP code points in the rangeD800–FFFF. The astral codepoint itself, by contrast, sorts after every BMP character under codepoint order. A pair of keys exists where the two orderings give opposite results; a spec-compliant verifier and a naive verifier would compute different canonical bytes (and different merkle roots) for the same input. - Float canonicalization. Python's
repr(float)is a shortest-round-trip representation, but it is not byte-for-byte identical to the ECMAScript 6.0Number.prototype.toStringalgorithm that RFC 8785 §3.2.2.3 prescribes. The two agree on many ordinary inputs (including0.1,0.2, and0.30000000000000004— see B13) but diverge on enough floating-point edges (very large magnitudes near the binary64 range boundary, subnormals, certain mid-range values) that relying onrepris a forever-trap.
Neither divergence affected fixtures B1–B10 (all BMP keys, integer or simple-decimal values), so the inline approximation was internally consistent with its own fixture set. But a real verifier using a real JCS library would have computed different canonical bytes for any input that crossed those edges — for example, the adversarial inputs in B11 (astral key) or B13 (float requiring shortest-round-trip).
The forever-contract in §2.4 closes this trap: profile-conformant implementations MUST use a real RFC 8785 library. For Python, that is the jcs PyPI module (version ≥ 0.2.1, jcs.canonicalize(obj) -> bytes); for other languages, any library whose test suite passes the RFC 8785 reference vectors. The fixtures in §8 were verified against jcs 0.2.1; verifier authors in any language SHOULD likewise re-verify every fixture's canonical bytes and leaf hashes against their chosen library before declaring implementation conformance.
Questions about this specification? Email hello@satsignal.cloud.