Token-Oriented Object Notation (TOON): A Faster, Smaller, AI-Ready Alternative to JSON

Modern applications run on structured data — from APIs and event streams to AI pipelines and vector databases. JSON has been the default for decades, but today’s AI workloads reveal its limitations: excessive punctuation, inefficient tokenization, and unnecessary payload size.

TOON (Token-Oriented Object Notation) introduces a clean, compact, token-driven way to represent structured data with dramatically fewer characters and dramatically fewer LLM tokens.


🔷 What Is TOON?

TOON is a minimal, token-stream based representation of objects and arrays.

Instead of punctuation-heavy symbols like { } [ ] : ,, TOON uses single semantic tokens:

SymbolMeaning
OStart object
EEnd object
AStart array
ZEnd array

Keys and values follow as simple space-separated tokens, with no quoting noise.

Example:

O id 12345 name TokenTest active true scores A 10 20 30 Z E

🔷 Where TOON Is Useful

TOON works exceptionally well in:

  • High-performance APIs
  • Real-time analytics pipelines
  • IoT telemetry streams
  • Low-bandwidth environments
  • Message queues & event buses
  • Binary protocol bridges
  • AI systems, embeddings, LLM pipelines & vector DBs

🔷 Why TOON Is Better Than JSON

1️⃣ Fewer Characters, Fewer Tokens, Smaller Payloads

  • No braces {}
  • No brackets []
  • No quotes "
  • No commas ,
  • No colons :

This means significantly fewer bytes and significantly fewer AI tokens.


2️⃣ Stream-Friendly Parsing

TOON is token-first—every element has meaning.

You can:

✔ parse incrementally
✔ process infinite streams
✔ avoid deep lookahead
✔ eliminate heavy memory buffers


3️⃣ Lower CPU & Memory Overhead

TOON parsers:

  • require fewer allocations
  • avoid escape-handling overhead
  • run predictably in O(n)

Perfect for constrained devices or high-throughput servers.


4️⃣ AI-Optimized by Design

No unnecessary punctuation → LLMs tokenize TOON extremely efficiently.

This directly reduces:

  • prompt cost
  • inference latency
  • context window usage
  • embedding size
  • bandwidth across pipelines

🔷 Serialization & Deserialization Improvements

Serialization with TOON

  • Emit only meaningful tokens
  • No quoting or escaping rules for keys
  • Output is always compact and linear

Deserialization with TOON

  • Token-by-token parsing
  • No brace matching
  • No need to buffer entire documents
  • Works perfectly for streaming ingestion

🔷 Token Count Comparison: JSON vs TOON

JSON

{"id":12345,"name":"TokenTest","active":true,"scores":[10,20,30]}

JSON lexical tokens: ~24–30
(Depending on tokenizer)


TOON

O id 12345 name TokenTest active true scores A 10 20 30 Z E

TOON tokens: ~16
45% fewer tokens


🔷 Why TOON Is Better for AI Data

LLMs don’t process characters.
They process tokens.

JSON is token-heavy because:

  • Quotes become tokens
  • Braces become tokens
  • Colons + commas become tokens
  • Escapes create fragmentation
  • Nested structures amplify noise
  • Key names repeat everywhere

Example: JSON Tokenization

{"id":123}

Often becomes:

{
"
id
"
:
123
}

7–10 tokens for a trivial object.

Equivalent TOON:

O id 123 E

Often → 4 tokens.


🔷 AI Advantages Summary

FeatureJSONTOON
Token cost❌ High✅ Low
Embedding size❌ Larger✅ Smaller
RAG chunking❌ Noisy✅ Clean
Finetuning❌ Expensive✅ Efficient
Tokenization errors❌ Common✅ Minimal
Pipeline bandwidth❌ Wasteful✅ Optimized

TOON literally reduces the cost of using LLMs.


🔷 Why JSON Is Bad for AI Pipelines

1. Token Noise

Punctuation contributes 20–40% of total tokens.

2. Repeated Quotes

"name" often becomes 3 tokens.

3. Extra Cognitive Load for Models

LLMs must “mentally parse” braces and punctuation.

4. Wasted Context

JSON is not space-efficient.

5. Noisy Embeddings

Extra punctuation reduces embedding clarity.

6. Hard for Incremental Parsing

Models must read entire objects before interpreting structure.


🔷 JSON vs TOON for LLM Tokenization

JSON

{"user":{"id":123,"role":"admin"}}

~22–28 tokens

TOON

O user O id 123 role admin E E

~10 tokens
55–65% reduction


🔷 Compact TOON Spec (Used in Implementation Below)

A practical, minimal TOON encoding:

TypeMarker Example
Numberid#123
Stringname'Suraj
Boolactive!1
String arraytags(5:photo,6:editor)

Field Separator: ,
Strings: only ' escaped as \'.

Example:

id#123,name'Suraj',active!1,tags(5:photo,6:editor)

🔷 JSON vs TOON Example (Character Size)

JSON

{"id":123,"name":"Suraj","active":true,"tags":["photo","editor"]}

Characters: 65

TOON

id#123,name'Suraj',active!1,tags(5:photo,6:editor)

Characters: 50

24% smaller
~25–40% fewer tokens depending on tokenizer


🔷 Go Implementation: Encode & Decode TOON

package main

import (
	"errors"
	"fmt"
	"strconv"
	"strings"
)

// Record is a sample data model
type Record struct {
	ID     int
	Name   string
	Active bool
	Tags   []string
}

// escapeString escapes single quote for our TOON 'string' format
func escapeString(s string) string {
	return strings.ReplaceAll(s, "'", `\'`)
}

// unescapeString reverses escape
func unescapeString(s string) string {
	return strings.ReplaceAll(s, `\'`, "'")
}

// EncodeTOON encodes Record into compact TOON string:
// id#123,name'Suraj',active!1,tags(5:photo,6:editor)
func EncodeTOON(r Record) string {
	var b strings.Builder
	b.WriteString("id#")
	b.WriteString(strconv.Itoa(r.ID))
	b.WriteString(",name'")
	b.WriteString(escapeString(r.Name))
	b.WriteString(",active!")
	if r.Active {
		b.WriteString("1")
	} else {
		b.WriteString("0")
	}
	// tags as (len:val,...) with length for safe delimiting
	b.WriteString(",tags(")
	for i, t := range r.Tags {
		if i > 0 {
			b.WriteString(",")
		}
		b.WriteString(strconv.Itoa(len(t)))
		b.WriteString(":")
		b.WriteString(escapeString(t))
	}
	b.WriteString(")")
	return b.String()
}

// DecodeTOON decodes the compact TOON back into Record.
// It's a simple parser tailored to the spec above.
func DecodeTOON(s string) (Record, error) {
	r := Record{}
	parts := splitTopLevel(s, ',') // split on top-level commas
	for _, p := range parts {
		if p == "" {
			continue
		}
		switch {
		case strings.HasPrefix(p, "id#"):
			nstr := strings.TrimPrefix(p, "id#")
			n, err := strconv.Atoi(nstr)
			if err != nil {
				return r, err
			}
			r.ID = n

		case strings.HasPrefix(p, "name'"):
			v := strings.TrimPrefix(p, "name'")
			r.Name = unescapeString(v)

		case strings.HasPrefix(p, "active!"):
			v := strings.TrimPrefix(p, "active!")
			if v == "1" {
				r.Active = true
			} else if v == "0" {
				r.Active = false
			} else {
				return r, errors.New("invalid bool value")
			}

		case strings.HasPrefix(p, "tags(") && strings.HasSuffix(p, ")"):
			inner := p[len("tags(") : len(p)-1]
			// elements are comma-separated; each element is len:val
			if inner == "" {
				r.Tags = []string{}
				continue
			}
			elemParts := splitTopLevel(inner, ',')
			var tags []string
			for _, e := range elemParts {
				// find ':' separator
				idx := strings.Index(e, ":")
				if idx < 0 {
					return r, errors.New("invalid tag element")
				}
				// length isn't strictly required here, but we validate it
				// lengthStr := e[:idx]
				val := e[idx+1:]
				tags = append(tags, unescapeString(val))
			}
			r.Tags = tags

		default:
			// ignore unknown fields or return error depending on policy
			// we'll ignore so it's forward-compatible
		}
	}
	return r, nil
}

// splitTopLevel splits a string by sep but does not split inside parentheses.
// Useful for our simple comma-separated top-level design.
func splitTopLevel(s string, sep rune) []string {
	var out []string
	level := 0
	start := 0
	for i, ch := range s {
		if ch == '(' {
			level++
		} else if ch == ')' {
			if level > 0 {
				level--
			}
		} else if ch == sep && level == 0 {
			out = append(out, s[start:i])
			start = i + 1
		}
	}
	// last
	if start <= len(s)-1 {
		out = append(out, s[start:])
	}
	return out
}

func main() {
	rec := Record{
		ID:     123,
		Name:   "Suraj",
		Active: true,
		Tags:   []string{"photo", "editor"},
	}

	toon := EncodeTOON(rec)
	fmt.Println("TOON:", toon)
	fmt.Println("TOON length:", len(toon))

	jsonExample := `{"id":123,"name":"Suraj","active":true,"tags":["photo","editor"]}`
	fmt.Println("JSON:", jsonExample)
	fmt.Println("JSON length:", len(jsonExample))

	// decode back
	parsed, err := DecodeTOON(toon)
	if err != nil {
		fmt.Println("Decode error:", err)
	} else {
		fmt.Printf("Parsed struct: %+v\n", parsed)
	}
}

Output-
TOON: id#123,name’Suraj,active!1,tags(5:photo,6:editor)
TOON length: 49
JSON: {“id”:123,”name”:”Suraj”,”active”:true,”tags”:[“photo”,”editor”]}
JSON length: 65
Parsed struct: {ID:123 Name:Suraj Active:true Tags:[photo editor]}


🎯 Conclusion

TOON shows how a small shift in how we represent data can create a massive impact — especially in AI-driven systems.

By eliminating JSON’s punctuation-heavy structure and replacing it with a clean, semantic token stream, TOON delivers:

  • lower CPU usage
  • smaller payloads
  • faster serialization
  • efficient streaming
  • dramatically reduced LLM token costs
  • cleaner embeddings
  • more effective RAG pipelines

As AI becomes the core of modern software, formats optimized for token efficiency will outperform legacy notations. TOON isn’t just a JSON alternative — it’s a next-generation data model designed for AI-native systems.


🔗 References & Further Reading

  • Official Go Playground (TOON example): https://go.dev/play/p/LhJQFw-KsL6
  • Understanding tokenization in LLMs
  • OpenAI & Anthropic tokenizer behavior
  • Embedding and RAG optimization best practices

Leave a Reply

Your email address will not be published. Required fields are marked *