Comparison of data serialization formats

From HandWiki

This is a comparison of data serialization formats, various ways to convert complex objects to sequences of bits. It does not include markup languages used exclusively as document file formats.

Overview

Name Creator-maintainer Based on Standardized? Specification Binary? Human-readable? Supports references?e Schema-IDL? Standard APIs Supports Zero-copy operations
Apache Arrow Apache Software Foundation N/A No Apache Arrow™ Yes No N/A Yes (built-in) Java, Python, C++, JavaScript Yes
Apache Avro Apache Software Foundation N/A No Apache Avro™ 1.8.1 Specification Yes No N/A Yes (built-in) N/A N/A
Apache Parquet Apache Software Foundation N/A No Apache Parquet[1] Yes No No N/A Java, Python No
Argdata Nuxi, the Netherlands YAML Yes Argdata binary encoding Yes No No No No Yes
ASN.1 ISO, IEC, ITU-T N/A Yes ISO/IEC 8824; X.680 series of ITU-T Recommendations Yes
(BER, DER, PER, OER, or custom via ECN)
Yes
(XER, JER, GSER, or custom via ECN)
Partialf Yes (built-in) N/A N/A
Bale Murray N/A No Bale Specification Yes No No Yes N/A
Bencode Bram Cohen (creator)
BitTorrent, Inc. (maintainer)
N/A Yes Part of BitTorrent protocol specification Partially
(numbers and delimiters are ASCII)
No No No No N/A
Binn Bernardo Ramos N/A Yes Binn Specification Yes No No No No Yes
Bond Microsoft N/A No Bond IDL Specification Yes Yes
(JSON,
XML)
No Yes Yes
(C++, C#, Java, Python)
N/A
BSON MongoDB JSON Yes BSON Specification Yes No No No No N/A
Candle Markup Henry Luo XML, JSON, JavaFX Yes Candle Markup Reference No Yes Yes
(XPointer, XPath)
Yes
(Candle Pattern Reference)
Yes
(XQuery, XPath)
N/A
Cap’n Proto Kenton Varda N/A No Cap'n Proto Encoding Spec Yes Partialh Yes Yes No Yes
CBOR Carsten Bormann, P. Hoffman JSON (loosely) Yes RFC-7049 Yes No Yes
through tagging
Yes
(CDDL)
No Yes
Colfer Pascal de Kloe N/A No Specification Wiki Yes No No Yes No Yes
Comma-separated values (CSV) RFC author:
Yakov Shafranovich
N/A Partial
(myriad informal variants used)
RFC 4180
(among others)
No Yes No No No No
Common Data Representation (CDR) Object Management Group N/A Yes General Inter-ORB Protocol Yes No Yes Yes ADA, C, C++, Java, Cobol, Lisp, Python, Ruby, Smalltalk N/A
D-Bus Message Protocol freedesktop.org N/A Yes D-Bus Specification Yes No No Partial
(Signature strings)
Yes
(see D-Bus)
N/A
edn Rich Hickey N/A No Extensible Data Notation No Yes No No No N/A
Efficient XML Interchange (EXI) W3C XML, Efficient XML Yes Efficient XML Interchange (EXI) Format 1.0 Yes Yes
(XML)
Yes
(XPointer, XPath)
Yes
(XML Schema)
Yes
(DOM, SAX, StAX, XQuery, XPath)
N/A
ER7 Health Level 7 N/A Yes Health Level 7[2] Partially - Schema available Yes Yes Yes HAPI[1] No
Fast Binary Encoding Ivan Shynkarenka N/A No Fast Binary Encoding (FBE) specification Yes Partiali No Yes (built-in) C++, C#, Java, JavaScript, Kotlin, Python, Ruby Yes
Feather Wes McKinney and Hadley Wickham No [3] Yes No No N/A Python, R Yes
FlatBuffers Google N/A No flatbuffers github page Specification Yes Yes
(Apache Arrow)
Partial
(internal to the buffer)
Yes [4] C++, Java, C#, Go, Python, Rust, JavaScript, PHP, C, Dart, Lua, TypeScript Yes
GVariant GLib D-Bus MP Yes GVariant Serialization Yes No No Yes
(Type strings)
No N/A
Fast Infoset ISO, IEC, ITU-T XML Yes ITU-T X.891 and ISO/IEC 24824-1:2007 Yes Yes
(XML)
Yes
(XPointer, XPath)
Yes
(XML schema)
Yes
(DOM, SAX, XQuery, XPath)
N/A
FHIR[2] Health Level 7 REST basics Yes Fast Healthcare Interoperability Resources[5] Yes Yes Yes Yes Hapi for FHIR[3] JSON, XML, Turtle No
HOCON Typesafe Inc. JSON No "HOCON (Human-Optimized Config Object Notation)" No Yes Yes ? Yes
(native Java API for all JVM languages)
No
Ion Amazon JSON No The Amazon Ion Specification Yes Yes No No No N/A
Java serialization Oracle Corporation N/A Yes Java Object Serialization Yes No Yes No Yes N/A
JSON Douglas Crockford JavaScript syntax Yes RFC 7159
(ancillary:
RFC 6901,
RFC 6902)
No, but see BSON, Smile, UBJSON Yes Yes
(JSON Pointer (RFC 6901);
alternately:
JSONPath, JPath, JSPON, json:select()), JSON-LD
Partial
(JSON Schema Proposal, ASN.1 with JER, Kwalify, Rx, Itemscript Schema), JSON-LD
Partial
(Clarinet, JSONQuery, JSONPath), JSON-LD
No
KMIP OASIS n/a Yes Oasis Yes (Tag, Type, Length, Value) Yes No No No N/A
MessagePack Sadayuki Furuhashi JSON (loosely) Yes MessagePack format specification Yes No No No No Yes
Named Binary Tag Mojang ? No Named Binary Tag specification Yes No No No ? No
Netstrings Dan Bernstein N/A Yes netstrings.txt Yes Yes No No No Yes
OGDL Rolf Veen ? Yes Specification Yes
(Binary Specification)
Yes Yes
(Path Specification)
Yes
(Schema WD)
N/A
OPC-UA Binary OPC Foundation N/A Yes opcfoundation.org Yes No Yes No No N/A
OpenDDL Eric Lengyel C, PHP Yes OpenDDL.org No Yes Yes No Yes
(OpenDDL Library)
N/A
PHP's serialize() & unserialize() PHP Group N/A Yes No Yes Yes Yes No Yes N/A
Pickle (Python) Guido van Rossum Python Yes [6] PEP 3154 -- Pickle protocol version 4 Yes No No No Yes
([7])
No
Data::Dumper format (Core Perl Module) Gurusamy Sarathy (ActiveState developer) Perl data types Yes No No Yes No ? Yes N/A
Property list NeXT (creator)
Apple (maintainer)
? Partial Public DTD for XML format Yesa Yesb No ? Cocoa, CoreFoundation, OpenStep, GnuStep No
Protocol Buffers (protobuf) Google N/A No Developer Guide: Encoding Yes Partiald No Yes (built-in) C++, C#, Java, Python, Javascript, Go No
S-expressions John McCarthy (original)
Ron Rivest (internet draft)
Lisp, Netstrings Partial
(largely de facto)
"S-Expressions" Internet Draft Yes
("Canonical representation")
Yes
("Advanced transport representation")
No No N/A
SCaViS jWork.ORG N/A Yes No Yes Yes
(XML, Java Serialization, ProtocolBuffers)
Yes Yes
(Java object persistency, XML, ProtocolBuffers)
Yes
(Native Java API, bindings for Jython, JRuby, Groovy and others)
N/A
Simple Binary Encoding Real Logic N/A No SBE github page Yes No ? Yes
(XML)
C++, Java, C#, Go Yes
Smile Tatu Saloranta JSON Yes Smile Format Specification Yes No No Partial
(JSON Schema Proposal, other JSON schemas/IDLs)
Partial
(via JSON APIs implemented with Smile backend, on Jackson, Python)
N/A
SOAP W3C XML Yes W3C Recommendations:
SOAP/1.1
SOAP/1.2
Partial
(Efficient XML Interchange, Binary XML, Fast Infoset, MTOM, XSD base64 data)
Yes Yes
(built-in id/ref, XPointer, XPath)
Yes
(WSDL, XML schema)
Yes
(DOM, SAX, XQuery, XPath)
N/A
Structured Data eXchange Formats Max Wildgrube N/A Yes RFC 3072 Yes No No No N/A
Thrift Facebook (creator)
Apache (maintainer)
N/A No Original whitepaper Yes Partialc No Yes (built-in) N/A
transit Cognitect JSON
MessagePack
No Transit Format Yes Yes No No Yes
Clojure, Java, Python, Javascript, Ruby, Scala
Yes
UBJSON The Buzz Media, LLC JSON, BSON No [8] Yes No No No No N/A
VelocyPack (VPack) ArangoDB N/A No VelocyPack (VPack) Version 1 Specification Yes No Partialg No Yes
(C++ API reference implementation)
Yes
eXternal Data Representation (XDR) Sun Microsystems (creator)
IETF (maintainer)
N/A Yes RFC 4506 Yes No Yes Yes Yes N/A
XML W3C SGML Yes W3C Recommendations:
1.0 (Fifth Edition)
1.1 (Second Edition)
Partial
(Efficient XML Interchange, Binary XML, Fast Infoset, XSD base64 data)
Yes Yes
(XPointer, XPath)
Yes
(XML schema, RELAX NG)
Yes
(DOM, SAX, XQuery, XPath)
N/A
XML-RPC Dave Winer[4] XML Yes XML-RPC Specification No Yes No No No N/A
YAML Clark Evans,
Ingy döt Net,
and Oren Ben-Kiki
C, Java, Perl, Python, Ruby, Email, HTML, MIME, URI, XML, SAX, SOAP, JSON[5] Yes Version 1.2 No Yes Yes Partial
(Kwalify, Rx, built-in language type-defs)
No N/A
Name Creator-maintainer Based on Standardized? Specification Binary? Human-readable? Supports references?e Schema-IDL? Standard APIs Supports Zero-copy operations
  • a. ^ The current default format is binary.
  • b. ^ The "classic" format is plain text, and an XML format is also supported.
  • c. ^ Theoretically possible due to abstraction, but no implementation is included.
  • d. ^ The primary format is binary, but a text format is available.[6]
  • e. ^ Means that generic tools/libraries know how to encode, decode, and dereference a reference to another piece of data in the same document. A tool may require the IDL file, but no more. Excludes custom, non-standardized referencing techniques.
  • f. ^ ASN.1 does offer OIDs, a standard format for globally unique identifiers, as well as a standard notation ("absolute reference") for referencing a component of a value. Thus it would be possible to reference a component of an encoded value present in a document by combining an OID (assigned to the document) and an "absolute reference" to the component of the value. However, there is no standard way to indicate that a field contains such an absolute reference. Therefore, a generic ASN.1 tool/library cannot automatically encode/decode/resolve references within a document without help from custom-written program code.
  • g. ^ VelocyPack offers a value type to store pointers to other VPack items. It is allowed if the VPack data resides in memory, but not if stored on disk or sent over a network.
  • h. ^ The primary format is binary, but a text format is available.[7][8]
  • i. ^ The primary format is binary, but text and json formats are available.[9]

Syntax comparison of human-readable formats

Format Null Boolean true Boolean false Integer Floating-point String Array Associative array/Object
ASN.1
(XML Encoding Rules)
<foo /> <foo>true</foo> <foo>false</foo> <foo>685230</foo> <foo>6.8523015e+5</foo> <foo>A to Z</foo>
<SeqOfUnrelatedDatatypes>
    <isMarried>true</isMarried>
    <hobby />
    <velocity>-42.1e7</velocity>
    <bookname>A to Z</bookname>
    <bookname>We said, "no".</bookname>
</SeqOfUnrelatedDatatypes>
An object (the key is a field name):
<person>
    <isMarried>true</isMarried>
    <hobby />
    <height>1.85</height>
    <name>Bob Peterson</name>
</person>

A data mapping (the key is a data value):

<competition>
    <measurement>
        <name>John</name>
        <height>3.14</height>
    </measurement>
    <measurement>
        <name>Jane</name>
        <height>2.718</height>
    </measurement>
</competition>

a

Candle Markup (), "" true false 685230
-685230
6.8523015e+5 "A to Z"
"""
A
to
Z
"""
(true, (), -42.1e7, "A to Z")
_{%342=true A%20to%20Z=(1, 2, 3)}
or
_{
  _{key=42 value=true}
  _{key="A to Z" value=(1, 2, 3)}
}
CSVb nulla
(or an empty element in the row)a
1a
truea
0a
falsea
685230
-685230a
6.8523015e+5a A to Z
"We said, ""no""."
true,,-42.1e7,"A to Z"
42,1
A to Z,1,2,3
edn[10] nil true false 685230
-685230
685230N
6.8523015e+5
6.8523015e+5M
"A to Z"
"We said, \"no\"."
[true, nil, -42.1e7, "A to Z"] {"42" true "A to Z" [1, 2, 3]}
Format Null Boolean true Boolean false Integer Floating-point String Array Associative array/Object
Ion

null
null.null
null.bool
null.int
null.float
null.decimal
null.timestamp
null.string
null.symbol
null.blob
null.clob
null.struct
null.list
null.sexp

true false 685230
-685230
0xA74AE
0b111010010101110
6.8523015e5 "A to Z"

'''
A
to
Z
'''
[true, null, -42.1e7, "A to Z"] {'42': true, 'A to Z': [1, 2, 3]}
Netstringsc 0:,a
4:null,a
1:1,a
4:true,a
1:0,a
5:false,a
6:685230,a 9:6.8523e+5,a 6:A to Z, 29:4:true,0:,7:-42.1e7,6:A to Z,, 41:9:2:42,1:1,,25:6:A to Z,12:1:1,1:2,1:3,,,,a
JSON null true false 685230
-685230
6.8523015e+5 "A to Z" [true, null, -42.1e7, "A to Z"] {"42": true, "A to Z": [1, 2, 3]}
nulla truea falsea 685230a 6.8523015e+5a "A to Z"
'A to Z'
NoSpaces
true
null
-42.1e7
"A to Z"

(true, null, -42.1e7, "A to Z")

42
  true
"A to Z"
  1
  2
  3
42
  true
"A to Z", (1, 2, 3)
Format Null Boolean true Boolean false Integer Floating-point String Array Associative array/Object
OpenDDL ref {null} bool {true} bool {false} int32 {685230}
int32 {0x74AE}
int32 {0b111010010101110}
float {6.8523015e+5} string {"A to Z"} Homogeneous array:
int32 {1, 2, 3, 4, 5}

Heterogeneous array:

array
{
    bool {true}
    ref {null}
    float {-42.1e7}
    string {"A to Z"}
}
dict
{
    value (key = "42") {bool {true}}
    value (key = "A to Z") {int32 {1, 2, 3}}
}
PHP's serialize() & unserialize() N; b:1; b:0; i:685230;
i:-685230;
d:685230.15;d
d:INF;
d:-INF;
d:NAN;
s:6:"A to Z"; a:4:{i:0;b:1;i:1;N;i:2;d:-421000000;i:3;s:6:"A to Z";} Associative array:
a:2:{i:42;b:1;s:6:"A to Z";a:3:{i:0;i:1;i:1;i:2;i:2;i:3;}}
Object:
O:8:"stdClass":2:{s:4:"John";d:3.14;s:4:"Jane";d:2.718;}d
Pickle (Python)
Property list
(plain text format)[11]
N/A <*BY> <*BN> <*I685230> <*R6.8523015e+5> "A to Z" ( <*BY>, <*R-42.1e7>, "A to Z" )
{
    "42" = <*BY>;
    "A to Z" = ( <*I1>, <*I2>, <*I3> );
}
Property list
(XML format)[12][13]
N/A <true /> <false /> <integer>685230</integer> <real>6.8523015e+5</real> <string>A to Z</string>
<array>
    <true />
    <real>-42.1e7</real>
    <string>A to Z</string>
</array>
<dict>
    <key>42</key>
    <true />
    <key>A to Z</key>
    <array>
        <integer>1</integer>
        <integer>2</integer>
        <integer>3</integer>
    </array>
</dict>
Protocol Buffers N/A true false 685230
-685230
20.0855369 "A to Z"
"sdfff2 \000\001\002\377\376\375"
"q\tqq<>q2&\001\377"
field1: "value1"
field1: "value2"
field1: "value3
anotherfield {
  foo: 123
  bar: 456
}
anotherfield {
  foo: 222
  bar: 333
}
thing1: "blahblah"
thing2: 18923743
thing3: -44
thing4 {
  submessage_field1: "foo"
  submessage_field2: false
}
enumeratedThing: SomeEnumeratedValue
thing5: 123.456
[extensionFieldFoo]: "etc"
[extensionFieldThatIsAnEnum]: EnumValue
Format Null Boolean true Boolean false Integer Floating-point String Array Associative array/Object
S-expressions NIL
nil
T
#tf
true
NIL
#ff
false
685230 6.8523015e+5 abc
"abc"
#616263#
3:abc
{MzphYmM=}
|YWJj|
(T NIL -42.1e7 "A to Z") ((42 T) ("A to Z" (1 2 3)))
transit[14] null true false 685230
-685230
"~n685230"
6.8523015e+5
"~f685230.15"
"A to Z"
"We said, \"no\"."
[true,null,-4.21E8,"A to Z"] ["^ ","42",true,"A to Z",[1,2,3]]
YAML ~
null
Null
NULL[15]
y
Y
yes
Yes
YES
on
On
ON
true
True
TRUE[16]
n
N
no
No
NO
off
Off
OFF
false
False
FALSE[16]
685230
+685_230
-685230
02472256
0x_0A_74_AE
0b1010_0111_0100_1010_1110
190:20:30[17]
6.8523015e+5
685.230_15e+03
685_230.15
190:20:30.15
.inf
-.inf
.Inf
.INF
.NaN
.nan
.NAN[18]
A to Z
"A to Z"
'A to Z'
[y, ~, -42.1e7, "A to Z"]
- y
-
- -42.1e7
- A to Z
{"John":3.14, "Jane":2.718}
42: y
A to Z: [1, 2, 3]
XMLe and SOAP <null />a true false 685230 6.8523015e+5 A to Z
<item>true</item>
<item xsi:nil="true"/>
<item>-42.1e7</item>
<item>A to Z<item>
<map>
  <entry key="42">true</entry>
  <entry key="A to Z">
    <item val="1"/>
    <item val="2"/>
    <item val="3"/>
  </entry>
</map>
XML-RPC <value><boolean>1</boolean></value> <value><boolean>0</boolean></value> <value><int>685230</int></value> <value><double>6.8523015e+5</double></value> <value><string>A to Z</string></value>
<value><array>
  <data>
  <value><boolean>1</boolean></value>
  <value><double>-42.1e7</double></value>
  <value><string>A to Z</string></value>
  </data>
  </array></value>
<value><struct>
  <member>
    <name>42</name>
    <value><boolean>1</boolean></value>
    </member>
  <member>
    <name>A to Z</name>
    <value>
      <array>
        <data>
          <value><int>1</int></value>
          <value><int>2</int></value>
          <value><int>3</int></value>
          </data>
        </array>
      </value>
    </member>
</struct>
  • a. ^ Omitted XML elements are commonly decoded by XML data binding tools as NULLs. Shown here is another possible encoding; XML schema does not define an encoding for this datatype.
  • b. ^ The RFC CSV specification only deals with delimiters, newlines, and quote characters; it does not directly deal with serializing programming data structures.
  • c. ^ The netstrings specification only deals with nested byte strings; anything else is outside the scope of the specification.
  • d. ^ PHP will unserialize any floating-point number correctly, but will serialize them to their full decimal expansion. For example, 3.14 will be serialized to 3.140000000000000124344978758017532527446746826171875.
  • e. ^ XML data bindings and SOAP serialization tools provide type-safe XML serialization of programming data structures into XML. Shown are XML values that can be placed in XML elements and attributes.
  • f. ^ This syntax is not compatible with the Internet-Draft, but is used by some dialects of Lisp.

Comparison of binary formats

Format Null Booleans Integer Floating-point String Array Associative array/Object
ASN.1
(BER, PER or OER encoding)
NULL type BOOLEAN:
  • BER: as 1 byte in binary form;
  • PER: as 1 bit;
  • OER: as 1 byte
INTEGER:
  • BER: variable-length big-endian binary representation (up to 2^(2^1024) bits);
  • PER Unaligned: a fixed number of bits if the integer type has a finite range; a variable number of bits otherwise;
  • PER Aligned: a fixed number of bits if the integer type has a finite range and the size of the range is less than 65536; a variable number of octets otherwise;
  • OER: one, two, or four octets (either signed or unsigned) if the integer type has a finite range that fits in that number of octets; a variable number of octets otherwise
REAL:

base-10 real values are represented as character strings in ISO 6093 format;

binary real values are represented in a binary format that includes the mantissa, the base (2, 8, or 16), and the exponent;

the special values NaN, -INF, +INF, and negative zero are also supported

Multiple valid types (VisibleString, PrintableString, GeneralString, UniversalString, UTF8String) data specifications SET OF (unordered) and SEQUENCE OF (guaranteed order) user definable type
Bale[19] void

Encoded as zero bytes.

bool

True: 0x01 False: 0x00

  • u8, u16, u32, u64

Big-endian fixed size unsigned integers.

  • i8, i16, i32, i64

Big-endian two's complement signed integers.

  • uv

Variable length unsigned integer with a compact encoding.

f32, f64

Single and double precision big-endian floats.

string

Length-prefixed sequence of bytes.

  • array x

Length-prefixed sequence of any type.

  • n x

Fixed-length array.

  • tuple

Fixed size object, encoded as the concatenation of all members.

  • map k v

Key-value map of arbitrary size, encoded as an array of pairs.

Binn[20] \x00 True: \x01
False: \x02
big-endian 2's complement signed and unsigned 8/16/32/64 bits single: big-endian binary32
double: big-endian binary64
UTF-8 encoded, null terminated, preceded by int8 or int32 string length in bytes Typecode (one byte) + 1-4 bytes size + 1-4 bytes items count + list items Typecode (one byte) + 1-4 bytes size + 1-4 bytes items count + key/value pairs
Bintoken[21] 0x82 True: 0x81
False: 0x80
Single byte integers in the range [-32;127]

Fixed length integers for 8-bits, 16-bits, 32-bits, and 64-bits integers.

Encoded as two's complement little-endian values.

Little-endian IEEE single/double precision numbers. UTF-8 encoded type-length-value string. Balanced brackets with an optional array count. Arrays can be nested. Balanced brackets with an optional object count. Objects can be nested.
BSON[22] Null type – 0 bytes for value True: one byte \x01
False: \x00
int32: 32-bit little-endian 2's complement or int64: 64-bit little-endian 2's complement double: little-endian binary64 UTF-8 encoded, preceded by int32 encoded string length in bytes BSON embedded document with numeric keys BSON embedded document
Concise Binary Object Representation (CBOR)[23] \xf6 True: \xf5
False: \xf4
Small positive number \x00-\x17, small negative number \x20-\x37 (abs(N) <= 23)

8-bit: positive \x18\xhh, negative \x38\xhh
16-bit: positive \x19<uint16_t>, negative \x39<uint16_t>
32-bit: positive \x1A<uint32_t>, negative \x3A<uint32_t>
64-bit: positive \x1B<uint64_t>, negative \x3B<uint64_t>
Negative number x encoded as ~x (binary inversion) or as (-x-1)
Byte order – Big-endian

Typecode (one byte) + IEEE half/single/double Typecode with length (like integer coding) and content.

Bytestring and UTF-8 have different typecode

Typecode with count (like integer coding) and items Typecode with pairs count (like integer coding) and pairs
Efficient XML Interchange (EXI) xsi:nil element (1-4 bits depending on context) 1 bit. 0–12 bits (log2 range) bits for integers with defined ranges less than 4096. Extensible sequence of octets with infinite range for larger or undefined ranges. Also supports custom representations. Scalable floating point representation requiring 18 to 88 bits depending on magnitude. Also supports IEEE and custom representations. Length prefixed sequence of Unicode code points with partitioned string tables for efficient representation of repeated items. The length and code points are represented as variable length unsigned integers where values under 128 require 1 octet each. Also supports custom representations. Repeated elements or length-prefixed list of values. Also supports custom representations. Ordered (sequence) or unordered (all) group of named elements.
Fast Binary Encoding[24] Encoded optional types:
  • int32?

null: \x00
value: \x01

Single byte boolean:
  • bool

True: \x01
False: \x00

Little-endian encoded signed integer:
  • int8, int16, int32, int64

Little-endian encoded unsigned integer:

  • uint8, uint16, uint32, uint64
Little-endian encoded floats:
  • float

Little-endian encoded doubles:

  • double

Little-endian encoded decimals:

  • decimal

Single byte character:

  • char

Little-endian encoded 32 bit Unicode character:

  • wchar

UTF-8 encoded string, preceded by length (32 bit) in bytes:

  • string
Vector of any other type, preceded by its length (32 bit) and relative offset (32 bit)
  • Fixed size array: int32[10]
  • Dynamic vector: int32[]
  • Linked list: int32()
  • Set: int32!
Vector of (key, value) pairs, preceded by its length (32 bit) and relative offset (32 bit)
  • Ordered map: int32<string>
  • Unordered hash table: int32{string}
FlatBuffers[25] Encoded as absence of field in parent object True: one byte \x01
False: \x00
little-endian 2's complement signed and unsigned 8/16/32/64 bits floats: little-endian binary32

doubles: little-endian binary64

UTF-8 encoded, preceded by 32 bit integer length of string in bytes Vectors of any other type, preceded by 32 bit integer length of number of elements Tables (schema defined types) or Vectors sorted by key (maps / dictionaries)
MessagePack \xc0 True: \xc3
False: \xc2
Single byte "fixnum" (values -32..127)

or typecode (one byte) + big-endian (u)int8/16/32/64

Typecode (one byte) + IEEE single/double Typecode + up to 15 bytes
or
typecode + length as uint8/16/32 + bytes;
encoding is unspecified[26]
As "fixarray" (single-byte prefix + up to 15 array items)

or typecode (one byte) + 2–4 bytes length + array items

As "fixmap" (single-byte prefix + up to 15 key-value pairs)

or typecode (one byte) + 2–4 bytes length + key-value pairs

Named Binary Tag
Netstrings 0:, True: 1:1,

False: 1:0,

OGDL Binary
Property list
(binary format)
Protocol Buffers[27] Variable encoding length signed 32-bit: varint encoding of "ZigZag"-encoded value (n << 1) XOR (n >> 31)

Variable encoding length signed 64-bit: varint encoding of "ZigZag"-encoded (n << 1) XOR (n >> 63)
Constant encoding length 32-bit: 32 bits in little-endian 2's complement
Constant encoding length 64-bit: 64 bits in little-endian 2's complement

floats: little-endian binary32

doubles: little-endian binary64

UTF-8 encoded, preceded by varint-encoded integer length of string in bytes Repeated value with the same tag N/A
Sereal 0x25 True: 0x3b
False: 0x3a
Single byte POS/NEG (values -16..15)

or typecode (one byte) + "varint" encoded variable length integer or typecode (one byte) + "zigzag" encoded variable length integer

Typecode (one byte) + IEEE single/double/quad As "SHORT_BINARY" (single-byte prefix + up to 31 raw bytes)

or typecode (one byte, including boolean UTF8-encoding flag) + "varint" encoded length + raw bytes

As "ARRAYREF" (single-byte prefix + up to 15 array items)

or typecode (one byte) + "varint" encoded length + array items

As "HASHREF" (single-byte prefix + up to 15 key-value pairs)

or typecode (one byte) + "varint" encoded length + key-value pairs. Distinguishes hashmaps from objects / class instances.

Smile \x21 True: \x23
False: \x22
Single byte "small" (values -16..15 encoded using \xc0 - \xdf),

zigzag-encoded varints (1–11 data bytes), or BigInteger

IEEE single/double, BigDecimal Length-prefixed "short" Strings (up to 64 bytes), marker-terminated "long" Strings and (optional) back-references Arbitrary-length heterogenous arrays with end-marker Arbitrary-length key/value pairs with end-marker
Structured Data eXchange Formats (SDXF) big-endian signed 24-bit or 32-bit integer big-endian IEEE double either UTF-8 or ISO 8859-1 encoded list of elements with identical ID and size, preceded by array header with int16 length chunks can contain other chunks to arbitrary depth
Thrift
VelocyPack[28] 0x00 none,
0x18 null
True: 0x1a
False: 0x19
signed integers, little-endian, 1 to 8 bytes, 2's complement: 0x20-0x27 + int;

unsigned integers, little-endian, 1 to 8 bytes: 0x28-0x2f + uint;
small integers 0, 1, ... 9: 0x30-0x39;
small negative integers -6, -5, ..., -1: 0x3a-0x3f;
UTC-date in milliseconds since the epoch, little-endian, 2's complement: 0x1c + uint64

double IEEE-754, little-endian: 0x1b + uint64 equivalent;

positive long packed BCD-encoded float: 0xc8-0xcf + 8 bytes;
negative long packed BCD-encoded float: 0xd0-0xd7 + 8 bytes

UTF-8 string, 0–126 bytes length: 0x40-0xbe + 0..126 bytes;

variable length UTF-8 string, little-endian, unsigned integer, not zero-terminated and may contain zero bytes: 0xbf + 8 bytes byte-length + string

empty array: 0x01;

array without index table, all sub items 1/2/4/8 bytes byte-length;
array with 1/2/4/8 byte index table offsets, byte-length and number of sub values;
compact array, no index table: 0x13

empty object: 0x0a;

object with 1/2/4/8 byte index table offsets, sorted by attribute name, 1/2/4/8 byte byte-length and number of sub values;
object with 1/2/4/8 byte index table offsets, not sorted by attribute name, 1/2/4/8 byte byte-length and number of sub values;
compact object, no index table: 0x14

It should be noted that any XML based representation can be compressed, or generated as, using EXI - Efficient XML Interchange, which is a "Schema Informed" (as opposed to schema-required, or schema-less) binary compression standard for XML.

See also

References

  1. https://hapifhir.github.io/hapi-hl7v2/
  2. http://www.hl7.org/fhir/?ref=learnmore
  3. http://hapifhir.io/
  4. http://www.xml.com/pub/a/ws/2001/04/04/soap.html
  5. Ben-Kiki, Oren; Evans, Clark; Net, Ingy döt (2009-10-01). "YAML Ain’t Markup Language (YAML) Version 1.2". The Official YAML Web Site. http://yaml.org/spec/1.2/spec.html#id2708710. Retrieved 2012-02-10. 
  6. https://developers.google.com/protocol-buffers/docs/reference/cpp/google.protobuf.text_format
  7. https://github.com/sandstorm-io/capnproto/blob/master/c++/src/capnp/serialize-text.h
  8. https://capnproto.org/capnp-tool.html#decoding-messages
  9. https://github.com/chronoxor/FastBinaryEncoding#json-serialization
  10. "Extensible Data Notation". https://github.com/edn-format/edn. 
  11. http://www.gnustep.org/resources/documentation/Developer/Base/Reference/NSPropertyList.html
  12. https://developer.apple.com/mac/library/documentation/Darwin/Reference/ManPages/man5/plist.5.html
  13. https://developer.apple.com/mac/library/documentation/CoreFoundation/Conceptual/CFPropertyLists/Articles/XMLTags.html#//apple_ref/doc/uid/20001172-CJBEJBHH
  14. "Transit Format". https://github.com/cognitect/transit-format. 
  15. "Null Language-Independent Type for YAML Version 1.1". YAML.org. 2005-01-18. http://yaml.org/type/null.html. Retrieved 2009-09-12. 
  16. 16.0 16.1 "Boolean Language-Independent Type for YAML Version 1.1". YAML.org. Clark C. Evans. 2005-01-18. http://yaml.org/type/bool.html. Retrieved 2009-09-12. 
  17. "Integer Language-Independent Type for YAML Version 1.1". YAML.org. Clark C. Evans. 2005-02-11. http://yaml.org/type/int.html. Retrieved 2009-09-12. 
  18. "Floating-Point Language-Independent Type for YAML Version 1.1". YAML.org. Clark C. Evans. 2005-01-18. http://yaml.org/type/float.html. Retrieved 2009-09-12. 
  19. https://github.com/ii8/bale
  20. https://github.com/liteserver/binn/blob/master/spec.md
  21. https://github.com/bintoken/bintoken
  22. http://bsonspec.org
  23. RFC 7049
  24. https://chronoxor.github.io/FastBinaryEncoding/documents/FBE.html
  25. https://google.github.io/flatbuffers/flatbuffers_internals.html
  26. https://github.com/msgpack/msgpack/blob/master/spec.md#formats-str
  27. https://developers.google.com/protocol-buffers/docs/encoding
  28. https://github.com/arangodb/velocypack/blob/master/VelocyPack.md

External links