satsignal.json.field.v1 — JSON-field selective-disclosure profile

DEPRECATED / INERT (read this first). This satsignal.json.field.v1 dotted profile is deprecated and inert. It defined a salted rule with one leaf per nested primitive value addressed by RFC-6901 JSON Pointer (every deep field a leaf) and a salted/framed leaf preimage. No production flow emits or consumes it. Its allowlist literal is retained forever (an allowlist literal is never removed) and its frozen regression corpus is kept solely as a regression-guard record — it is never produced or verified by any live path. The rules below stay frozen for that regression corpus; do not implement them for new work. Live successor: json-keypath-v1. New JSON disclosures use the native json-keypath-v1 literal, which binds to the chunk_merkle a JSON anchor already commits. Note the granularity difference: this deprecated profile hashed every deep RFC-6901 pointer (salted); the live json-keypath-v1 segments by top-level key (key:jcs(value) entry, bare/sealed leaf, native merkle binding — no re-anchor, no salt keyfile in standard mode). The two cannot interbind. Authority for the deprecation: disclosure-v1 §11.

Versioning (2026-05-27). This is satsignal.json.field.v1. The profile literal is fixed at "satsignal.json.field.v1" and is bound into the leaf-hash preimage at anchor time. This literal is a forever-contract: once any client has anchored under it, the segmentation, canonicalization, normalization, salting, and leaf_id construction rules pinned below are fixed for that literal forever. A bug in those rules cannot be patched in place; the only remedy is a new satsignal.json.field.vN+1 profile that compatible verifiers must support in parallel.

Status: draft 1, 2026-05-27. Audience: integrators who anchor a structured JSON document (configs, invoices, API records, structured event payloads) and later want to publish a redacted view revealing a chosen subset of fields with cryptographic proof that those fields are members of the original anchor; verifier authors who must reproduce per-leaf hashes byte-for-byte to render and check such a view. Goal: define exactly one set of rules — input encoding, canonical form, leaf extraction, leaf identifier, leaf-hash preimage, salt strategy — under which a JSON document is segmented into a leaf-set whose merkle root commits to every primitive value in the document, so that a later disclosure can selectively reveal any subset of those values.

1. Why this exists

A great deal of the data the notary is asked to anchor arrives as structured JSON: configuration snapshots, invoices, API request / response records, IoT readings, audit-log events. The anchorer often wants to commit the whole record at time T1, then later publish a partial view of it — a few specific fields — without revealing the other fields and without re-anchoring. Selective disclosure (../disclosure-v1.md) provides the machinery; this profile pins the per-leaf rules that make it work for JSON.

The design strategy:

What this profile deliberately does NOT do:

2. Inputs and canonicalization

2.1 Encoding

Decision (forever): UTF-8 mandatory. Inputs that are not valid UTF-8 fail closed at preprocessing with invalid_utf8_input before any parsing is attempted. Rationale: JSON's interchange encoding is UTF-8 per RFC 8259 §8.1; admitting other encodings would force every verifier to ship multi- codec recovery and would break canonical-bytes reproducibility.

2.2 BOM handling

Decision (forever): A single leading UTF-8 byte-order mark (EF BB BF) at byte offset 0 is stripped before parsing. BOMs that appear elsewhere in the byte stream are NOT stripped — they live inside string values as the Unicode character U+FEFF and are part of the value. Rationale: RFC 8259 §8.1 forbids a leading BOM but real-world tooling emits one anyway; tolerating exactly the leading case is the documented industry workaround. Treating an interior U+FEFF as a stripper would silently corrupt string content.

2.3 JSON validity

Decision (forever): Input MUST be valid JSON per RFC 8259. The following extensions are NOT permitted and fail closed with invalid_json_input:

Rationale: the canonicalization step is defined only over strict RFC 8259 inputs; admitting JSON5 / JSONC variants opens an unbounded set of edge cases that would be forever-contracted into the profile.

2.4 Canonical form

Decision (forever): the input is run through RFC 8785 JCS before any further processing. JCS pins, in summary:

A verifier MUST reproduce JCS output byte-for-byte; deviation breaks the leaf-hash preimage. Rationale: JCS is the only widely-deployed spec that pins JSON canonical bytes; reinventing canonicalization inside this profile would be a forever-contract on every edge case JCS already settled.

Forever-contract — use a real RFC 8785 implementation. Canonical form is RFC 8785 JCS, not an approximation of it. A profile-conformant anchorer or verifier MUST use a real RFC 8785 implementation (e.g. the Python jcs PyPI module ≥ 0.2.1, or any other library whose test suite passes the RFC 8785 reference vectors). Hand-rolled approximations of JCS — typically "sorted keys

  1. Supplementary-plane code points (U+10000–U+10FFFF). RFC 8785 §3.2.3 sorts object keys by UTF-16 code-unit order, not Unicode codepoint order. The two orderings diverge for keys that contain astral characters: e.g. U+1F389 (🎉) encodes as the UTF-16 surrogate pair D83C DF89, so under UTF-16 code-unit sort it sorts AFTER any BMP key whose first code unit is below 0xD83C (including all of U+E000–U+D83B) but BEFORE BMP keys in 0xD83C–0xFFFF. A naive Python sorted(keys) puts the same key strictly AFTER every BMP code point. Pairs of keys exist where the two orderings produce different output bytes.
  2. Floating-point values. RFC 8785 §3.2.2.3 mandates the ECMAScript 6.0 Number.prototype.toString algorithm — the "shortest decimal that round-trips through IEEE 754 binary64." Python's repr is a close approximation but is not byte-for-byte identical to ECMAScript's output for every input; for trust applications the only safe choice is a library that tracks the RFC.

This is a forever-contract: any future implementation that wants to verify a satsignal.json.field.v1 anchor MUST use real RFC 8785 canonicalization. See §11 ("Why not a hand-rolled JCS") for the historical context behind this constraint, and the worked examples B11–B13 (§8) for inputs that specifically exercise the divergent edges.

2.5 Top-level type

Decision (forever): the top-level value MUST be a JSON object or a JSON array. Top-level primitives — a bare string, number, boolean, or null at the root — fail closed with invalid_top_level_type. Rationale: a top-level primitive has no leaf-set in a meaningful sense — there is one trivially-addressed value at the root (JSON Pointer "") and no point in segmenting it. Anchorers who want to commit a single primitive value can wrap it: {"value": "..."}.

2.6 Number canonicalization

Decision (forever): per JCS. Integer-valued numbers serialize as bare decimals without a trailing decimal point. Floats serialize as their shortest round-trip representation. Exponent forms collapse to integer form when the value is an integer in IEEE 754 double. Examples (canonical form on the right):

InputJCS canonical form
1e1010000000000
1000000000010000000000
10000000000.010000000000
1.01
3.143.14
-00
1.5e-20.015

Worked example: {"x": 1e10} and {"x": 10000000000} and {"x": 10000000000.0} all canonicalize to the byte sequence {"x":10000000000} and produce identical leaves. See fixture B6.

NaN, +Infinity, -Infinity are not legal JSON numbers and are rejected per §2.3.

3. Leaf extraction

3.1 What is a leaf

Decision (forever): every primitive value (string, number, true, false, null) reachable from the root via JCS-canonical traversal is a leaf. Objects and arrays are NOT leaves — they are container nodes whose primitive descendants ARE the leaves. Rationale: the disclosure model assumes the anchorer wants to reveal individual scalar facts ("the customer's name", "the price of item 3"). Treating a container as a leaf would either commit to its full JCS bytes (defeating per-field disclosure) or require a separate sub-tree commitment (a different leaf-hash preimage — not v1).

Example. The document

{"a": {"b": 1, "c": "x"}, "d": [10, 20]}

produces four leaves at JSON Pointer paths /a/b, /a/c, /d/0, /d/1. The objects /a, the document root, and the array /d are NOT leaves; they are containers walked to find the primitives.

3.2 leaf_id construction

Decision (forever): the leaf_id is the RFC 6901 JSON Pointer that addresses the leaf in the canonical form. Standard RFC 6901 escaping applies to object keys:

(Escaping order matters: replace ~ first, then /; this is the RFC 6901 contract and the rule any pointer-aware library implements.)

Array elements address by zero-based decimal index with no leading zeros. Rationale: JSON Pointer is the standard cross-implementation way to address a value inside a JSON document; it has stable semantics, well-defined escaping, and avoids reinventing a pointer syntax inside this profile.

Examples:

Position in documentleaf_id
object key name at root/name
object key name inside customer/customer/name
array index 3 of root key items/items/3
key sku inside element 0 of root items/items/0/sku
key with / in its name: {"a/b": ...}/a~1b
key with ~ in its name: {"c~d": ...}/c~0d
Unicode key: {"日本語": ...}/日本語 (raw UTF-8 bytes)

3.3 Leaf ordering

Decision (forever): depth-first, JCS-canonical-key order. The anchorer walks the JCS-canonicalized document:

  1. At each object node, visit keys in the JCS-canonical order (codepoint sort over UTF-16 code units — identical to JCS object- key ordering).
  2. At each array node, visit elements in increasing index order (0, 1, 2, …).
  3. Recurse depth-first; emit a leaf record when reaching a primitive value.

Leaf 0 is the first primitive emitted by this walk; leaf N-1 is the last. This ordering defines the leaf-set's positions in the merkle tree (see ../disclosure-v1.md §3.4 invariant 4). Rationale: a fixed total order is required for the merkle tree to have a single root; depth-first JCS-canonical-key order is the unique walk that an anchorer and a verifier can both reproduce from the canonical bytes alone, with no auxiliary table.

3.4 Empty containers

Decision (forever): an empty object {} and an empty array [] contribute zero leaves. They are not themselves leaves and have no primitive descendants; they neither appear in the leaf-set nor increment the leaf count. Rationale: an empty container has no committable value. Anchorers who need to commit "this field is intentionally empty" can substitute a sentinel primitive, e.g. null (which DOES produce a leaf with value_canonical_bytes = null) instead of {} or [].

3.5 Documents with zero leaves

Decision (forever): forbidden. A canonical input that produces zero leaves — top-level {}, top-level [], or any nested structure with no primitive descendants — fails closed with empty_leaf_set. Rationale: a leaf-set of size zero has no merkle root to commit; the anchor would have nothing to bind the disclosure into.

4. Leaf-hash preimage

4.1 Byte layout

Decision (forever): the leaf hash is computed over the following byte sequence. The layout is shared with the other v1 profiles in this set — same separator byte, same field order, same encoding rules — so a cross-profile reader implements one preimage builder and only the per-field rules vary.

leaf_hash = sha256(
    profile_literal_utf8     // "satsignal.json.field.v1" as UTF-8 bytes (23 bytes)
  || 0x00                    // separator
  || leaf_id_utf8            // JSON Pointer as UTF-8 bytes (e.g. "/items/0/sku")
  || 0x00                    // separator
  || value_canonical_bytes   // see §4.2
  || 0x00                    // separator
  || salt_bytes              // raw bytes from base64-decoding salt_b64 (16 bytes)
)

The three 0x00 separator bytes are pinned forever for v1 and are identical across the v1 profile set. Rationale: the separators prevent boundary smuggling — an attacker who can choose leaf_id and value should not be able to craft a value-suffix that, concatenated with the next field's prefix, produces the same preimage as a different (leaf_id, value) pair. Without separators, leaf_id="/a" + value="b/c" and leaf_id="/a/b" + value="c" would produce identical preimage bytes; with 0x00 separators they differ because the JSON-canonical encoding of a string value cannot itself contain 0x00 (control characters MUST be escaped as \uXXXX per §2.4) and a JSON Pointer cannot contain 0x00 either (it would be invalid UTF-8 if inserted raw, and ~-escaping does not produce nulls).

4.2 value_canonical_bytes

Decision (forever): the leaf's primitive value, serialized in JCS form (§2.4) as UTF-8 bytes. The JSON-token form IS the canonical value — the surrounding quotes for strings, the lowercase true/false/null literals, the canonical number form. The verifier does NOT unwrap string quotes before hashing.

Examples (all in UTF-8 bytes):

Primitive valuevalue_canonical_bytes (text)Length
string "hello""hello"7
string """"2
string "x""x"3
number 42422
number 001
number -1-12
number 3.143.144
boolean truetrue4
boolean falsefalse5
nullnull4
string "42""42"4
string "he said \"hi\"""he said \"hi\"" (JCS-escaped form)17

Rationale: keeping the JSON-token form prevents two distinct primitives from colliding under hash. Without quotes, the string "42" and the number 42 would produce identical value_canonical_bytes (both 42) and identical leaf_hash — breaking the property that distinct leaves produce distinct commitments. The JSON-token form preserves the type tag.

4.3 Worked example

Take fixture B1 (§9): {"name": "Alice", "age": 42}. The leaf at /age with the published salt produces this preimage and hash:

Full 48-byte preimage:

7361747369676e616c2e6a736f6e2e6669656c642e7631002f61676500343200ffe37d1dde1ae92a77bcc0c15f8901bd

leaf_hash = sha256(preimage) = 533e320213e48d42a8a9472c8ad12739a576ab172fd126b632ad4f27a79ae687.

5. Salts

5.1 Salt size

Decision (forever): 16 raw bytes per leaf, encoded in salt_b64 as standard base64 (RFC 4648 §4) with = padding. Rationale: 16 bytes (128 bits) is sufficient entropy to defeat brute-force preimage search against the leaf hash for low-entropy field values (a guessed candidate value cannot be tested against a target leaf_hash without also obtaining the salt). Going larger costs storage in proofs.json with no cryptographic gain at the sha256 output size.

5.2 Salt uniqueness

Decision (forever): each leaf MUST have a unique, independently- generated salt sourced from a cryptographically-secure pseudorandom number generator (CSPRNG). Salt reuse across leaves is forbidden — even within the same document. Rationale: reusing a salt would let an attacker who learns one revealed leaf's preimage relate its hash structure to other leaves' hashes; per-leaf salts make each leaf's preimage independent.

5.3 Salt persistence

Decision (forever): the anchorer persists the full salt-set client-side in the bundle's proofs.json (off-chain), keyed by leaf_id. The salt-set is NOT committed on-chain and NOT included in the canonical doc. Rationale: salts are part of the off-chain material an anchorer must retain to later produce a disclosure; losing them irrecoverably forfeits the ability to disclose. The on-chain commit binds the merkle root, which depends on the salts; the salts themselves are private until the anchorer chooses to reveal them as part of a disclosure record.

6. Merkle behavior (cross-reference)

This profile defers all merkle-tree construction and proof invariants to ../disclosure-v1.md §3.4: hash algorithm pinning, single-leaf-tree behavior, odd-node promote- unchanged rule, leaf ordering source, and the per-step (sibling || frontier) vs (frontier || sibling) direction encoding. Per that section, raw 32-byte concatenation is used at every step; ASCII-hex concatenation is explicitly forbidden.

The only profile-defined input to that machinery is the leaf-set: its element count (every primitive value in the canonical form, per §3) and its total order (depth-first JCS-canonical-key order, per §3.3).

7. Original anchor binding

A document anchored under this profile commits to its leaf-set through a chunk_merkle proof in the original .mbnt's canonical doc. The bound fields:

FieldValue under this profile
subject.proofs.chunk_merkle.schemethe literal string "satsignal.json.field.v1"
subject.proofs.chunk_merkle.algothe literal string "sha256" (v1 is standard-mode only)
subject.proofs.chunk_merkle.leaf_counttotal count of primitives in the JCS canonical form
subject.proofs.chunk_merkle.rootmerkle root over leaves in depth-first JCS-canonical order

A future sealed-mode variant of this profile would carry algo: "merkle-hmac-sha256" and a salt_version; sealed-mode JSON- field disclosure is out of scope for v1 (../disclosure-v1.md §4 step 5 fails closed on algo != "sha256").

A disclosure record bound to this profile carries linked_anchor.subject_profile == "satsignal.json.field.v1" and each revealed[i].profile == "satsignal.json.field.v1"; the disclosure verifier walks the binding chain in ../disclosure-v1.md §4 and the per-leaf recomputation in ../disclosure-v1.md §7 step 4.

8. Fixtures (test vectors)

The fixtures below are reproducible from the rules in §2–§5. All sha256 digests are computed (not placeholder); a verifier implementing this profile MUST reproduce them byte-for-byte. Salts in the fixtures are pinned deterministically — salt_b64 = base64(sha256("json-field-v1|" + leaf_id + "|" + idx)[:16]) — so the fixtures' hashes can be reproduced from public bytes alone. Production anchorers MUST source salts from a CSPRNG, not from a fixed derivation; the fixture salts exist purely to make the test vectors self-checkable.

B1 — minimal

Input:

{"name": "Alice", "age": 42}

JCS canonical bytes (length 25):

{"age":42,"name":"Alice"}

leaf_count = 2. Leaf records:

ileaf_idvalue_canonical (text)salt_b64leaf_hash
0/age42/+N9Hd4a6Sp3vMDBX4kBvQ==533e320213e48d42a8a9472c8ad12739a576ab172fd126b632ad4f27a79ae687
1/name"Alice"Aoki/skytzCabG+nnIwSaA==e5cb099fc0fe04f443c0ff86879162159cfed1fa92c80862a86021d391f9563c

Full merkle tree:

L0: [533e320213e48d42a8a9472c8ad12739a576ab172fd126b632ad4f27a79ae687,
     e5cb099fc0fe04f443c0ff86879162159cfed1fa92c80862a86021d391f9563c]
L1: [c1f5e68c87dcbf89ebd99b0967a34e81fb730b70569733c295f9a4769132e17c]

root = c1f5e68c87dcbf89ebd99b0967a34e81fb730b70569733c295f9a4769132e17c.

The per-byte preimage breakdown for leaf 0 is §4.3.

B2 — nested

Input:

{"customer": {"name": "Alice"}, "items": [{"sku": "x"}, {"sku": "y"}]}

JCS canonical bytes (length 63):

{"customer":{"name":"Alice"},"items":[{"sku":"x"},{"sku":"y"}]}

leaf_count = 3. Leaf records (depth-first, JCS-canonical-key order — customer sorts before items):

ileaf_idvalue_canonicalsalt_b64leaf_hash
0/customer/name"Alice"y3NcBZqzS4zaRIIdZRW46A==e27f140c6ce4897056e72d3990e28282dbc444564f6967f35c7b42e24e2dd800
1/items/0/sku"x"YxpPackcqe2LJXvDbgCfBQ==7510298f34bd4ca1a744238dc4c72c19d027494d0d958bed3d9b7bdcb76f824d
2/items/1/sku"y"5e0+PYeV10DXCyVzc8yntQ==6fa371b72e1bbd0e4a9bb0faba40d77179bd662a6cd47d9f9910dcb4ee58a309

B3 — array of primitives

Input:

{"tags": ["red", "green", "blue"]}

JCS canonical bytes (length 31):

{"tags":["red","green","blue"]}

leaf_count = 3. Leaves at index order:

ileaf_idvalue_canonicalsalt_b64leaf_hash
0/tags/0"red"2L3qHOs6n5ZKmWAuJPyLCA==05a106a0b911d29b1a074a7348cc6a35808fdc3d8f09bcf6aa69950f823acd17
1/tags/1"green"/3GljQQXj1WcZs5sHPlE3g==b60e1b4a97863ada9d603201a44cb12ea32b777b15649196d3bd8f02eaf18837
2/tags/2"blue"vTtfu+WKxLOek5DVN1W5kA==d130a05a39bcde12b74aa9ff199b407f0bef526e7ebb58c8633861deb308f1d2

B4 — escaping in key

Input:

{"a/b": 1, "c~d": 2}

JCS canonical bytes (length 17):

{"a/b":1,"c~d":2}

leaf_count = 2. Note the RFC 6901 escaping in the leaf_id:

ileaf_idvalue_canonicalsalt_b64leaf_hash
0/a~1b12PuUnKcgEGlYFi9yA4o+Sg==8c11a0c77968fcd807d7bb479351f126019d08f009a73fd0bae748175f138803
1/c~0d2g2QACzqIeCl9r0pyDbBaqA==3d40ab8c4502d035e7088ccfb2bfd8720089b052fa32c525015e53dc2f951249

The / in the key a/b is escaped to ~1; the ~ in c~d is escaped to ~0. The canonical JCS form preserves the raw key bytes verbatim (no pointer escaping) — escaping applies only to the leaf_id derived from the key.

B5 — key-order independence

Both inputs canonicalize to identical bytes and (under identical salts) produce identical leaves and identical merkle roots.

Input A:

{"b": 1, "a": 2}

Input B:

{"a": 2, "b": 1}

JCS canonical bytes (length 13) for both inputs:

{"a":2,"b":1}

leaf_count = 2. Leaf records (identical for A and B):

ileaf_idvalue_canonicalsalt_b64leaf_hash
0/a2s09hGGBqe743ijuoZtXX5Q==0fc62b52da5aa247be1ccc875a80f766b583311e46b8757f1457e4921e8fbd2c
1/b1sJS1rWYeell4nZY/eczBeQ==a3f063d1c47b3bb40ec0fd59c0d64c1d4d179b9612a54618b17290a2543d9d2e

Full merkle tree:

L0: [0fc62b52da5aa247be1ccc875a80f766b583311e46b8757f1457e4921e8fbd2c,
     a3f063d1c47b3bb40ec0fd59c0d64c1d4d179b9612a54618b17290a2543d9d2e]
L1: [fbe6a68a98d8f790c17910d7928bb059749566601e0a659a001742769425c2a9]

root = fbe6a68a98d8f790c17910d7928bb059749566601e0a659a001742769425c2a9 for both input A and input B.

B6 — number canonicalization

Both inputs canonicalize to identical bytes and produce an identical leaf.

Input A:

{"x": 1e10}

Input B:

{"x": 10000000000}

JCS canonical bytes (length 17) for both inputs:

{"x":10000000000}

leaf_count = 1. Leaf record (identical for A and B):

ileaf_idvalue_canonicalsalt_b64leaf_hash
0/x10000000000x6FsYO5SXlSLgJclepIGrQ==8c1c6fc64c987c97126d6f89a673fcfd7ae8021dff940b56658cd9b36254a49f

Because this is a single-leaf tree, the merkle root equals the leaf_hash directly (per ../disclosure-v1.md §3.4 invariant 2): root = 8c1c6fc64c987c97126d6f89a673fcfd7ae8021dff940b56658cd9b36254a49f, with proof_path = [] in any disclosure.

B7 — deeply nested (5 levels)

Input:

{"l1": {"l2": {"l3": {"l4a": {"l5x": "a", "l5y": "b"},
                       "l4b": {"l5z": "c"}}},
        "l2_other": "shallow"}}

JCS canonical bytes (length 89):

{"l1":{"l2":{"l3":{"l4a":{"l5x":"a","l5y":"b"},"l4b":{"l5z":"c"}}},"l2_other":"shallow"}}

leaf_count = 4. Depth-first order: at root key l1, descend into l2 (sorts before l2_other); at l2 descend into l3; at l3 descend into l4a (sorts before l4b); inside l4a emit l5x then l5y; back up to l4b and emit l5z; back up to root key l2_other and emit. The sequence:

ileaf_idvalue_canonicalsalt_b64leaf_hash
0/l1/l2/l3/l4a/l5x"a"80KcYHXB/oHIaQTBShRJHw==13f4a8b217b1b6cf24e5b0d81e37071df1ce79f677a5d2a69cdda4c84b29044e
1/l1/l2/l3/l4a/l5y"b"BdIpHXr8hwhTfuA0QH4/QA==91e4ba4fd003068817926556e3b637433881a4d4a124247ea55860344e1299d3
2/l1/l2/l3/l4b/l5z"c"r8NpIC2z1pJuc76tNwUBWg==afdfaf21de5c84a470ed6620c417f0b855e0f23e9fc84d340739b003337df797
3/l1/l2_other"shallow"Zbhw7x3XGxWp18S/wmwBMw==1f843cf03d4819990c2de782e3f7d5be3927ed2bab4b07c8cf86e5af108fca0e

This fixture exercises both the depth-first descent (the three deep-leaves emit before the shallow l2_other) and the JCS key-sort at each level (l2 before l2_other; l4a before l4b).

B8 — empty container ignored

Input:

{"a": {}, "b": [1, 2]}

JCS canonical bytes (length 18):

{"a":{},"b":[1,2]}

leaf_count = 2 (the empty object at /a contributes no leaves):

ileaf_idvalue_canonicalsalt_b64leaf_hash
0/b/01OCp+U2WXiMn0e2gFrOe/iw==dddc9b30277d36cb570a6753524066c79f292d0a409c3cdde1c04db189dc9ecf
1/b/12sTl41+6TsYZDqLK6uBjt/w==9adf0476749d5f96335aacfedf3a0d92902dc518b4f4ce7ea3d0860659010e51

B9 — all four primitive types

Input:

{"s": "x", "n": 42, "b": true, "z": null}

JCS canonical bytes (length 34):

{"b":true,"n":42,"s":"x","z":null}

leaf_count = 4. JCS sorts keys: b, n, s, z.

ileaf_idvalue_canonicalsalt_b64leaf_hash
0/btrue3uAJOOs4NDoURTt8IpEHUg==5f3493e5b8337640ad2b5a1db799a776a80c263bd54d2eef8d0fdc1072afd5fb
1/n42gw566yLkzEmBS6VbGgLbHA==ffd31facf8b5f0b481c18a861efd0116d6be68900205177f17527e9d17cf300a
2/s"x"U+OG2AshqriwJ6vWssu0Aw==e09e427a6396ade8cd34158196c2e7f782ae9b55db6f8def0453d73d5149994f
3/znullbZZRVWIZmVu/4i/dxW+MBw==ddcd08f44874a9db2f639efabe4b6ea129f1dd93843f327d6640a308fd5a9e70

Note the distinct value_canonical forms — true, 42, "x", null — each carry their JSON-token form (with quotes for strings; lowercase for boolean and null). The type tag is preserved in the hash.

B10 — Unicode key

Input:

{"日本語": "value"}

JCS canonical bytes (length 21 — {, ", 9 bytes of UTF-8 for 日本語, ", :, ", value, ", }):

{"日本語":"value"}

leaf_count = 1. The leaf_id carries the raw UTF-8 bytes of the key (no Punycode, no percent-encoding):

ileaf_idvalue_canonicalsalt_b64leaf_hash
0/日本語"value"tO3WWZRnu/GtAhbPcxwXRQ==e9cb79b797d5b63386a673e9437f3e04a2b6efee5909dbf2f955a920949a539e

A verifier implementing JSON Pointer per RFC 6901 emits the key's UTF-8 bytes verbatim into the pointer; the leaf-hash preimage byte sequence at the leaf_id_utf8 slot is therefore 2f e6 97 a5 e6 9c ac e8 aa 9e (/ followed by the nine UTF-8 bytes of 日本語).

B11 — supplementary-plane (astral) key

This fixture exercises the astral / UTF-16 surrogate pair edge of RFC 8785 §3.2.3 key sorting. The key 🎉 (U+1F389, PARTY POPPER) is a supplementary-plane code point; its UTF-8 encoding is four bytes (F0 9F 8E 89) and its UTF-16 encoding is the surrogate pair (D83C DF89).

Salt scheme (pinned). B11's leaves use deterministic salts bytes([0x10 + i]) * 16 for leaf index i, encoded base64. This makes the fixture independently reproducible; production anchorers MUST use a CSPRNG per §5.

Input:

{"🎉": "party", "a": 1}

JCS canonical bytes (length 22, hex 7b2261223a312c22f09f8e89223a227061727479227d):

{"a":1,"🎉":"party"}

(The astral character is emitted verbatim as UTF-8 bytes, NOT as a 🎉 surrogate-pair escape — RFC 8785 §3.2.2.2 escapes only the JSON-required control characters and ASCII "/\; all other code points pass through as their UTF-8 bytes.)

UTF-16 sort observation (pin): under RFC 8785's UTF-16 code-unit order, "a" (0061) sorts before "🎉" (D83C DF89) because 0x0061 < 0xD83C. Python's naive codepoint sort agrees for THIS pair (ord("a")=0x61 < ord("🎉")=0x1F389), so the two orderings coincide here — B11 demonstrates that JCS correctly handles an astral key in the leaf-ID, but it does not by itself construct a two-key set where UTF-16 sort diverges from codepoint sort. A diverging set requires one BMP key in the range U+D83C – U+FFFF (higher than the astral's first surrogate under codepoint sort but sorted strictly AFTER the astral under UTF-16 because its first UTF-16 code unit exceeds D83C); we omit such a fixture here because the BMP range starting at U+D800 is the surrogate range itself and not a legal standalone JSON string code point. A real diverging pair uses three or more keys; we punt on it for v1 and instead pin the rule prose in §2.4 — implementers MUST use a real RFC 8785 library.

leaf_count = 2. Leaf records (depth-first, JCS-canonical-key order — "a" sorts before "🎉"):

ileaf_idleaf_id (hex)value_canonicalsalt_b64leaf_hash
0/a2f611EBAQEBAQEBAQEBAQEBAQEA==f45379b15faae8f901456be33951bf7d3197e2de19f8f12e0914a9dfa20650db
1/🎉2f f0 9f 8e 89"party"EREREREREREREREREREREQ==53461806857e1c8d1ccbdcc122b14b393204ad151608378f3caa44c712eb301e

Full merkle tree:

L0: [f45379b15faae8f901456be33951bf7d3197e2de19f8f12e0914a9dfa20650db,
     53461806857e1c8d1ccbdcc122b14b393204ad151608378f3caa44c712eb301e]
L1: [75b6ca1a4ed11e957b93a79e59a2c151230f3995e6cef4b95f69036b2c77dc6a]

root = 75b6ca1a4ed11e957b93a79e59a2c151230f3995e6cef4b95f69036b2c77dc6a.

B12 — supplementary-plane string value

This fixture exercises the astral character in a string value edge of RFC 8785 §3.2.2.2. Per JCS, only the control characters and ASCII " / \ are escaped; all other code points (including supplementary-plane) are emitted as their UTF-8 bytes verbatim. A JCS implementation that emits 🎉 surrogate-pair escapes for the astral character would produce different canonical bytes and is non-conformant.

Salt scheme (pinned). bytes([0x20]) * 16 for the single leaf.

Input:

{"emoji": "Hello 🎉 World"}

JCS canonical bytes (length 28, hex 7b22656d6f6a69223a2248656c6c6f20f09f8e8920576f726c64227d):

{"emoji":"Hello 🎉 World"}

leaf_count = 1. Leaf record:

ileaf_idvalue_canonical (text)value_canonical (hex)salt_b64leaf_hash
0/emoji"Hello 🎉 World"2248656c6c6f20f09f8e8920576f726c6422ICAgICAgICAgICAgICAgIA==f3b3e429dad8919ddcf649285f2ce201a4bb521fc9c2fbfb0e655907fa492235

Single-leaf tree, so root = f3b3e429dad8919ddcf649285f2ce201a4bb521fc9c2fbfb0e655907fa492235 (per ../disclosure-v1.md §3.4 invariant 2).

B13 — float requiring shortest round-trip

This fixture exercises the floating-point shortest-round-trip edge of RFC 8785 §3.2.2.3. The sum 0.1 + 0.2 in IEEE 754 binary64 is the rational number with shortest decimal representation 0.30000000000000004; that representation — exactly 17 significant digits — is what RFC 8785 (and ECMA-262 §7.1.12 / Number.prototype.toString) prescribes. Both Python repr and a real RFC 8785 library agree on this particular value, but the fixture pins the canonical bytes so implementers can verify their library matches.

Salt scheme (pinned). bytes([0x30 + i]) * 16 for leaf index i.

Input (the sum field is literally the IEEE 754 result of 0.1 + 0.2):

{"x": 0.1, "y": 0.2, "sum": 0.30000000000000004}

JCS canonical bytes (length 43):

{"sum":0.30000000000000004,"x":0.1,"y":0.2}

Float canonicalization observation (pin): jcs.canonicalize produces 0.1 for the value 0.1, 0.2 for 0.2, and 0.30000000000000004 for 0.1 + 0.2. These are the unique shortest decimal representations that round-trip through IEEE 754 binary64, matching RFC 8785 §3.2.2.3 and ECMA-262 §7.1.12. Implementations whose float canonicalization produces (for example) 3.0000000000000004e-1 or 0.30000000000000005 for 0.1 + 0.2 are non-conformant and will produce a different merkle root.

leaf_count = 3. Leaf records (depth-first, JCS-canonical-key order — sum, x, y sort alphabetically under UTF-16 code-unit sort):

ileaf_idvalue_canonicalsalt_b64leaf_hash
0/sum0.30000000000000004MDAwMDAwMDAwMDAwMDAwMA==b267dfb1b255784965e3d79dbf0f3b49b04e7811bb5239b3a7f08c81242297e0
1/x0.1MTExMTExMTExMTExMTExMQ==2d4e9189a942dce000a7c582e549e96ccc2e9615ef91388d62fd8750b8a4d2e2
2/y0.2MjIyMjIyMjIyMjIyMjIyMg==62971f509b214dd62dd201a555b62e273a13e91380cb193bab8bed464b0f08a4

Full merkle tree (odd-node promote-unchanged rule from ../disclosure-v1.md §3.4 invariant 3):

L0: [b267dfb1b255784965e3d79dbf0f3b49b04e7811bb5239b3a7f08c81242297e0,
     2d4e9189a942dce000a7c582e549e96ccc2e9615ef91388d62fd8750b8a4d2e2,
     62971f509b214dd62dd201a555b62e273a13e91380cb193bab8bed464b0f08a4]
L1: [2c931341ad58f50038a2a671e2a46c68bdb3be60cdc1f5f276878129fe6b50ed,
     62971f509b214dd62dd201a555b62e273a13e91380cb193bab8bed464b0f08a4]
L2: [1bdafd4b48bf230503eca1f44ea781b15258fe985331998ab203114791207764]

root = 1bdafd4b48bf230503eca1f44ea781b15258fe985331998ab203114791207764.

9. Out of scope for v1

The following are explicitly not addressed by this profile and are not implied by satsignal.json.field.v1:

10. Implementation note — why not a hand-rolled JCS

An earlier draft of this spec carried a hand-rolled JCS approximation as a reference implementation: sorted(dict.items(), key=lambda kv: kv[0]) for object keys, minimal-whitespace separators, and Python repr for non-integer floats. Two specific divergences from RFC 8785 were noted before that draft shipped:

  1. Object-key sort. Python's sorted is a codepoint sort (compares Unicode scalar values directly). RFC 8785 §3.2.3 mandates a UTF-16 code-unit sort. These diverge for supplementary-plane keys (U+10000 – U+10FFFF) — under UTF-16 the first code unit of any astral character is a high surrogate (D800–DBFF), which sorts strictly AFTER any BMP code point below 0xD800 but strictly BEFORE BMP code points in the range D800–FFFF. The astral codepoint itself, by contrast, sorts after every BMP character under codepoint order. A pair of keys exists where the two orderings give opposite results; a spec-compliant verifier and a naive verifier would compute different canonical bytes (and different merkle roots) for the same input.
  2. Float canonicalization. Python's repr(float) is a shortest-round-trip representation, but it is not byte-for-byte identical to the ECMAScript 6.0 Number.prototype.toString algorithm that RFC 8785 §3.2.2.3 prescribes. The two agree on many ordinary inputs (including 0.1, 0.2, and 0.30000000000000004 — see B13) but diverge on enough floating-point edges (very large magnitudes near the binary64 range boundary, subnormals, certain mid-range values) that relying on repr is a forever-trap.

Neither divergence affected fixtures B1–B10 (all BMP keys, integer or simple-decimal values), so the inline approximation was internally consistent with its own fixture set. But a real verifier using a real JCS library would have computed different canonical bytes for any input that crossed those edges — for example, the adversarial inputs in B11 (astral key) or B13 (float requiring shortest-round-trip).

The forever-contract in §2.4 closes this trap: profile-conformant implementations MUST use a real RFC 8785 library. For Python, that is the jcs PyPI module (version ≥ 0.2.1, jcs.canonicalize(obj) -> bytes); for other languages, any library whose test suite passes the RFC 8785 reference vectors. The fixtures in §8 were verified against jcs 0.2.1; verifier authors in any language SHOULD likewise re-verify every fixture's canonical bytes and leaf hashes against their chosen library before declaring implementation conformance.

Questions about this specification? Email hello@satsignal.cloud.