The data format for the AI era
Indent Comma Format combines the compactness of CSV, the readability of YAML and the hierarchy of JSON — schema-driven, streamable, and token-efficient by design.
@kind icf
@version 1.0
@specification https://icformat.org/icf/specification/v1
@schema
Invoice:
[InvoiceNo, InvoiceDate, Amount]
BillItems[]:
[SNo, Item, Quantity, Rate, Amount]
@data
@record id=INV001
Invoice:
= INV-2026-001, 2026-05-01, 84500
BillItems:
- 1, Cement, 100, 420, 42000
- 2, Steel Rod, 50, 850, 42500
Declare the schema once. Store every record positionally. No repeated keys, no closing tags.
Compact, readable, and built to stream
ICF minimizes repeated keywords by defining schemas once and storing subsequent data positionally — smaller files, faster parsing, and far fewer AI tokens.
CSV-level compactness
Field names live in the schema, not every row. Records are bare comma-separated values — the smallest honest representation of structured data.
YAML-level readability
Indentation expresses hierarchy. No brackets to balance, no quotes to escape — a human can read and hand-edit an ICF file with ease.
JSON-level hierarchy
Objects, collections, nested containers and master data — the full shape of a business document, not a flat table.
AI-efficient tokens
Fewer repeated keys means fewer tokens. Ideal for RAG datasets, LLM context windows and structured extraction pipelines.
Streamable & append-friendly
Records are line-oriented and self-contained. Append new records without rewriting the file; process huge archives without loading them whole.
Schema-driven validation
The declared schema makes records predictable and verifiable. Field counts, hierarchy and ordering are all checkable up front.
ICF — Indent Comma Format
The core serialization format: metadata directives, schema definitions, objects and collections, master data, escaping rules and preformatted text blocks. Stable at v1.0.
ICX — Indent Comma Index
An optional companion that indexes an ICF file for random access, fast lookup, integrity verification and incremental processing of large archives. Pure index — no business data.
Purpose-built for structured business data
ICF is intentionally schema-constrained — the trade that buys smaller files, faster parsing and predictable structure.
OCR extraction pipelines
Capture invoices and documents into a compact, hierarchical record stream with verbatim text blocks for raw OCR output.
Invoice & ERP interchange
Move structured documents between systems with master-data references that mirror relational tables.
Document archival
Store millions of records git-friendly and human-readable, indexed by a companion ICX for random access.
AI / RAG datasets
Feed models dense, low-token structured context instead of verbose JSON or XML.
Document management
Export and import entire DMS hierarchies — folders, files, index data and line items — in one file.
Human-editable records
Config and reference data a person can actually read and edit, with no brackets to balance.
Zero-dependency libraries, ready to use
Faithful, behaviorally matched implementations for the JVM and Python — parse, validate, build, write and generate ICX.
icfj
Pure Core Java, no runtime dependencies. The reference implementation and conformance authority.
icfpy
Pure Python, standard library only. A faithful behavioral port of icfj.
Validate ICF right in your browser
Paste an ICF document and get instant, structural feedback — schema field counts, indentation, text blocks and more. Nothing leaves your machine.
Open the validator