JData

From HandWiki
Short description: JData Specification
JData
OpenJData.png
Filename extension.jdt, .jdb
Internet media typeapplication/json
Type codeTEXT and BINARY
Developed byQianqian Fang
Initial release25 July 2019; 4 years ago (2019-07-25)[1]
Latest release
1.0 Draft 2
(25 July 2019; 4 years ago (2019-07-25)[2])
Type of formatData interchange
Extended fromJSON
Websiteopenjdata.org

JData is a light-weight data annotation and exchange open-standard designed to represent general-purpose and scientific data structures using human-readable (text-based) JSON and (binary) UBJSON formats. JData specification specifically aims at simplifying exchange of hierarchical and complex data between programming languages, such as MATLAB, Python, JavaScript etc. It defines a comprehensive list of JSON-compatible "name":value constructs to store a wide range of data structures, including scalars, N-dimensional arrays, sparse/complex-valued arrays, maps, tables, hashes, linked lists, trees and graphs, and support optional data grouping and metadata for each data element. The generated data files are compatible with JSON/UBJSON specifications and can be readily processed by most existing parsers. JData-defined annotation keywords also permit storage of strongly-typed binary data streams in JSON, data compression, linking and referencing.

History

The initial development of the JData annotation scheme started in 2011 as part of the development of the JSONLab Toolbox - a widely used open-source MATLAB/GNU Octave JSON reader/writer. The majority of the annotated N-D array constructs, such as _ArrayType_, _ArraySize_, and _ArrayData_, had been implemented in the early releases of JSONLab. In 2015, the first draft of the JData Specification was developed in the Iso2Mesh Wiki; since 2019, the subsequent development of the specification has been migrated to Github.

Releases

JData Version 0.5

The v0.5 version of the JData specification is the first complete draft and public request-for-comment (RFC) of the specification, made available on May 15, 2019. This preview version of the specification supports a majority of the data structures related to scientific data and research, including N-D arrays, sparse and complex-valued arrays, binary data interface, data-record-level compression, hashes, tables, trees, linked lists and graphs. It also describes the general approach for data linking and referencing. The reference implementation of this specification version is released as JSONLab v1.8.

JData Version 1 Draft 1

The Draft 1 of the JData specification Version 1 was released on June 4, 2019. The major changes in this release include 1) the serialization order of N-D array elements changes from column-major to row-major, 2) _ArrayData_ construct for complex N-D array changes from a 1-D vector to a two-row matrix, 3) support non-string valued keys in the hash data JSON representation, and 4) add a new _ByteStream_ object to serialize generic binary data or binary large object (BLOB). The reference implementation of this specification version is released as JSONLab v1.9.

JData Version 1 Draft 2

The Draft 2 of the JData specification Version 1 was released on July 25, 2019. The major changes in this release include 1) support storage of special matrices via the _ArrayShape_ tag, 2) renamed all _ArrayCompression*_ tags to _ArrayZip*_, 3) add dedicated table data keywords: _TableCols_, _TableRows_, and _TableRecords_. The reference implementation of this specification version is released as JSONLab v2.0.

JData annotation examples

Numerical scalars

Numerical values are directly supported by either JSON or UBJSON specifications. A numerical value is typically unchanged when converting to the JData annotation. When storing as files, they are directly stored in the JSON/UBJSON numerical value forms. For example

Native data text-JData/JSON form binary-JData(BJData/UBJSON)
a=3.14159 ➡️ {"a":3.14159} [{] [U][1][a][D][3.15169] [}]

Special constants and strings

There are a few special constants, namely "NaN", "Infinity" and "-Infinity", they are encoded as special string keywords when stored in the JSON/text-JData formats, but stay unchanged when stored in the binary JData format

Native data text-JData/JSON form binary-JData(BJData/UBJSON)
a=nan ➡️ {"a":"_NaN_"} [{] [U][1][a][D][nan] [}]
a=inf ➡️ {"a":"_Inf_"} [{] [U][1][a][D][inf] [}]
a=-inf ➡️ {"a":"-_Inf_"} [{] [U][1][D][-inf] [}]
a=true ➡️ {"a":true} [{] [U][1][a][T] [}]
a=false ➡️ {"a":false} [{] [U][1][a][F] [}]
a=null ➡️ {"a":null} [{] [U][1][a][Z] [}]
a="A string" ➡️ {"a":"A string"} [{] [U][1][a][S][U][8][A string] [}]


Structures and hashes

Hierarchical structures are often needed when representing metadata or simple lists with named-members. Because "structure" data-type can be directly mapped to the "object" construct in JSON and UBJSON, therefore, they do not need to be converted when using the JData annotation.

Native data text-JData/JSON form binary-JData(BJData/UBJSON)

a=struct(
  'i1',1,
  'f1',2.0,
  's1':"string"
)

➡️

{
  "a":{
  "i1":1,
  "f1":2.0,
  "s1":"string"
 }
}

[{]
[U][1][a]
[{]
[U][2][i1][U][1]
[U][2][f1][D][2.0]
[U][2][s1][S][6][string]
[}]
[}]


2D array in the array format

Simple 1-dimensional vectors are supported in both JSON and UBJSON using the "array" construct. For example

Native data text-JData/JSON form binary-JData(BJData/UBJSON)
a=[
  1,2,3
  4,5,6
]
➡️
{
 "a":[
  [1,2,3],
  [4,5,6]
 ]
}
[{]
[U][1][a]
[[]
] [U][1][U][2][U][3] [
] [U][4][U][5][U][6] [
[]]
[}]
similar to the 1-D row vector example above, we can
use the type [$] and count [#] markers to simplify
this array in the binary form
[{]
[U][1][a]
[[]
[[] [$][U] [#][U][3] [1][2][3]
[[] [$][U] [#][U][3] [4][5][6]
[]]
[}]
to simplify this further, in the JData Specification,

we further extended UBJSON array count marker [#] to accept a 1-D array count-type, representing the dimension vector of an N-D array, in this case [2,3] for a 2x3 matrix

[{]
[U][1][a]
[[] [$][U] [#][[] [$][U][#][2] [2][3]
[1][2][3][4][5][6]
[}]

2-D arrays in the annotated format

In JData specification, we introduced a light-weight data annotation approach to allow one to specify additional information, such as data type, data size and compression, in the stored data record. This is achieved using a "structure-like" data container (a structure is supported in almost all programming language) with JData-specified human-readable subfield keywords. This construct is also easily serialized using many of the existing JSON/UBJSON libraries.

For example, the above 2-D array can be alternatively stored using the annotated format to allow fine-grained data storage

Native data text-JData/JSON form binary-JData(BJData/UBJSON)

a=[
  1,2,3
  4,5,6
]

➡️

{
 "a":{
  "_ArrayType_":"uint8",
  "_ArraySize_":[2,3],
  "_ArrayData_":[1,2,3,4,5,6]
 }
}

[{]
[U][1][a]
[{]
[U][11][_ArrayType_] [S][U][5][uint8]
[U][11][_ArraySize_] ] [U][2][U][3] [
[U][11][_ArrayData_] [[] [$][U][#][6] [1][2][3][4][5][6]
[}]
[}]

3-D and higher dimensional array

One can use either the direct-format or annotated-format for storing higher dimensional arrays, as natively supported by both JSON/UBJSON, but the benefit of using the annotated format for text-based JData, and the packed-array optimized format for binary-JData becomes more advantageous due to faster processing speed.

Native data ➡️ text-JData/JSON form binary-JData(BJData/UBJSON)
a=
[
 [
  [1,9,6,0],
  [2,9,3,1],
  [8,0,9,6]
 ],
 [
  [6,4,2,7],
  [8,5,1,2],
  [3,3,2,6]
 ]
]
{
  "a":[
        [
          [1,9,6,0],
          [2,9,3,1],
          [8,0,9,6]
        ],
        [
          [6,4,2,7],
          [8,5,1,2],
          [3,3,2,6]
        ]
    ]
}
[{]
  [U][1][a]
  [[]
    [[]
      ] [U][1][U][9][U][6][u][0] [
      ] [U][2][U][9][U][3][u][1] [
      ] [U][8][U][0][U][9][u][6] [
    []]
    [[]
      ] [U][6][U][4][U][2][u][7] [
      ] [U][8][U][5][U][1][u][2] [
      ] [U][3][U][3][U][2][u][6] [
    []]
  []]
[}]
More efficient alternative formats using JData annotations
{
  "a":{
        "_ArrayType_":"uint8",
        "_ArraySize_":[2,3,4],
        "_ArrayData_":[1,9,6,0,2,9,3,1,8,0,9,6,6,4,2,7,8,5,1,2,3,3,2,6]
  }
}
[{]
  [U][1][a]
  [[] [$][U] [#][[] [$][U][#][3] [2][3][4]
     [1][9][6][0][2][9][3][1][8][0][9][6][6][4][2]
     [7][8][5][1][2][3][3][2][6]
[}]

Array data with compression

JData annotations supports data compression to save space. Several additional keywords are needed, including "_ArrayZipType" - the compression method used, "_ArrayZipSize_" - the dimension vector of the "preprocessed" data stored in the "_ArrayData_" construct before compression, and "_ArrayZipData_" - the compressed data byte-stream. For example

Native data text-JData/JSON form binary-JData(BJData/UBJSON)

a=[
  1,2,3
  4,5,6
]

➡️

{
 "a":{
  "_ArrayType_":"uint8",
  "_ArraySize_":[2,3],
  "_ArrayZipType_":"zlib"
  "_ArrayZipSize_":[1,6]
  "_ArrayZipData_":"eJxjZGJmYWUDAAA+ABY="
 }
}

[{]
[U][1][a]
[{]
[U][11][_ArrayType_] [S][U][5][uint8]
[U][11][_ArraySize_] ] [U][2][U][3] [
[U][14][_ArrayZipType_] [S][U][4][zlib]
[U][14][_ArrayZipSize_] ] [U][1][U][6] [
[U][14][_ArrayZipData_] [[] [$][U][#][14] [...compressed byte stream...]
[}]
[}]

Complex-number and complex-valued arrays

A complex-valued data record must be stored using the "annotated array format". This is achieved via the presence of _ArrayIsComplex_ keyword and the serialization of the complex values in the _ArrayData_ constructs in the order of serialized real-part values], [serialized imag-part values

Native data text-JData/JSON form binary-JData(BJData/UBJSON)
a=10.0+6.0j
➡️
{
 "a":{
  "_ArrayType_":"double",
  "_ArraySize_":[1,1],
  "_ArrayIsComplex_":true,
  "_ArrayData_":10.0],[6.0
 }
}
[{]
[U][1][a]
[{]
[U][11][_ArrayType_] [S][U][6][double]
[U][11][_ArraySize_] ] [U][2][U][3] [
[U][16][_ArrayIsComplex_] [T]
[U][11][_ArrayData_] ] [[][D][10.0][ ][D][6.0][ []] 
[}]
[}]
a=[
  1+2j,3+4j
  5+6j,7+8j
]
➡️
{
 "a":{
  "_ArrayType_":"uint8",
  "_ArraySize_":[2,2],
  "_ArrayIsComplex_":true
  "_ArrayData_":1,3,5,7],[2,4,6,8
 }
}
[{]
[U][1][a]
[{]
[U][11][_ArrayType_] [S][U][5][uint8]
[U][11][_ArraySize_] ] [U][2][U][2] [
[U][16][_ArrayIsComplex_] [T]
[U][11][_ArrayData_] [[]
[[] [$][U][#][4] [1][3][5][7]
[[] [$][U][#][4] [2][4][6][8]
[]]
[}]
[}]

Sparse arrays

Native data text-JData/JSON form binary-JData(BJData/UBJSON)
a=sparse(5,4);
a(1,1)=2.0;
a(2,3)=9.0;
a(4,2)=7.0;
➡️
{
 "a":{
  "_ArrayType_":"double",
  "_ArraySize_":[5,4],
  "_ArrayIsSparse_":true,
  "_ArrayData_":1,2,4],[1,3,2],[2.0,9.0,7.0
 }
}
[{]
[U][1][a]
[{]
[U][11][_ArrayType_] [S][U][6][double]
[U][11][_ArraySize_] ] [U][2][U][3] [
[U][16][_ArrayIsSparse_] [T]
[U][11][_ArrayData_] [[]
[[][$][U][#][3] [1][2][4]
[[][$][U][#][3] [1][3][2]
[[][$][D][#][3] [2.0][9.0][7.0]
[]]
[}]
[}]

Complex-valued sparse arrays

Native data text-JData/JSON form binary-JData(BJData/UBJSON)
a=sparse(5,4);
a(1,1)=2.0+1.2j;
a(2,3)=9.0-4.7j;
a(4,2)=7.0+1.0j;
➡️
{
 "a":{
  "_ArrayType_":"double",
  "_ArraySize_":[5,4],
  "_ArrayIsSparse_":true,
  "_ArrayData_":[[1,2,4],[1,3,2],
[2.0,9.0,7.0],[1.2,-4.7,1.0]]
 }
}
[{]
[U][1][a]
[{]
[U][11][_ArrayType_] [S][U][6][double]
[U][11][_ArraySize_] ] [U][2][U][3] [
[U][16][_ArrayIsSparse_] [T]
[U][11][_ArrayData_] [[]
[[][$][U][#][3] [1][2][4]
[[][$][U][#][3] [1][3][2]
[[][$][D][#][3] [2.0][9.0][7.0]
[[][$][D][#][3] [1.2][-4.7][1.0]
[]]
[}]
[}]

Tables

Native data ➡️ text-JData/JSON form binary-JData(BJData/UBJSON)
A table without row-name
Name    Age Degree Height
----    --- ------ ------
Andy    21  BS     69.2
William 21  MS     71.0
Om      22  BE     67.1
{
  "_TableCols_": ["Name", "Age", "Degree", "Height"],
  "_TableRows_": [],
  "_TableRecords_": [
    ["Andy",    21, "BS", 69.2],
    ["William", 21, "MS", 71.0],
    ["Om",      22, "BS", 67.1]
  ]
}
[{]
[U][11][_TableCols_] ]
[S][U][4][Name] [S][U][3][Age]
[S][U][6][Degree] [S][U][6][Height]
[
[U][11][_TableRows_] ] [
[U][14][_TableRecords_] ]
[[] [S][U][4][Andy] [U][21] [S][U][2][BS] [d][69.2] [
] [S][U][7][William] [U][21] [S][U][2][MS] [d][71.0] [
] [S][U][2][Om] [U][22] [S][U][2][BS] [d][67.1] [
[]]
[}]
specifying column data types
{
  "_TableCols_": [
    {"DataName":"Name",
     "DataType":"string"
    },
    {"DataName":"Age",
     "DataType":"int32"
    },
    {"DataName":"Degree",
     "DataType":"string"
    },
    {"DataName":"Height",
     "DataType":"single"
    }
  ],
  "_TableRows_": [],
  "_TableRecords_": [
    ["Andy",    21, "BS", 69.2],
    ["William", 21, "MS", 71.0],
    ["Om",      22, "BS", 67.1]
  ]
}
[{]
[U][11][_TableCols_] ]
[{] [S][U][8][DataName] [S][U][4][Name]
[S][U][8][DataType] [S][U][6][string]
[}]
[{] [S][U][8][DataName] [S][U][3][Age]
[S][U][8][DataType] [S][U][5][int32]
[}]
[{] [S][U][8][DataName] [S][U][6][Degree]
[S][U][8][DataType] [S][U][6][string]
[}]
[{] [S][U][8][DataName] [S][U][6][Height]
[S][U][8][DataType] [S][U][6][single]
[}]
[
[U][11][_TableRows_] ] [
[U][14][_TableRecords_] ]
[[] [S][U][4][Andy] [U][21] [S][U][2][BS] [d][69.2] [
] [S][U][7][William] [U][21] [S][U][2][MS] [d][71.0] [
] [S][U][2][Om] [U][22] [S][U][2][BS] [d][67.1] [
[]]
[}]

Trees

Native data ➡️ text-JData/JSON form binary-JData(BJData/UBJSON)
a tree data structure
root={id:0,data:10.1}
 ├── node1={id:1,data:2.5}
 ├── node2={id:2,data:100}
 │    ├── node2.1={id:3,data:9}
 │    └── node2.2={id:4,data:20.1}
 └── node3={id:5,data:-9.0}
{
   "_TreeNode_(root)":
       {"id":0,"data":10.1},
   "_TreeChildren_": [
       {"_TreeNode_(node1)":
          {"id":1,"data":2.5}
       },
       {
          "_TreeNode_(node2)": 
             {"id":2,"data":100},
          "_TreeChildren_": [
             {"_TreeNode_(node2.1)":
                {"id":3,"data":9}
             },
             {"_TreeNode_(node2.2)": 
                {"id":4,"data":20.1}
             }
          ]
       },
       {"_TreeNode_(node3)":
          {"id":5,"data":-9.0}
       }
   ]
 }
[{]
  [U][16][_TreeNode_(root)] [{] 
    [U][2][id] [l][0] [U][4][data] [d][10.1]
  [}]
  [U][14][_TreeChildren_] [[]
   [{] [U][16][_TreeNode_(node1)]
     [{] [U][2][id] [l][1] [U][4][data] [d][2.5][}]
   [}]
   [{] [U][16][_TreeNode_(node2)]
     [{] [U][2][id] [l][2] [U][4][data] [d][100][}]
       [U][14][_TreeChildren_] [[]
         [{] [U][16][_TreeNode_(node2.1)]
           [{] [U][2][id] [l][3] [U][4][data][d][9][}]
         [}]
         [{] [U][16][_TreeNode_(node2.2)]
           [{][U][2][id][l][4][U][4][data][d][20.1][}]
         [}]
       []]
   [}]
   [{] [U][16][_TreeNode_(node3)]
      [{] [U][2][id] [l][5] [U][4][data] [d][-9.0][}]
   [}]
  []]
[}]

Graphs

Native data ➡️ text-JData/JSON form binary-JData(BJData/UBJSON)
a directed graph object
head ={id:0,data:10.1}
    ⇓ e1
┌─node1={id:1,data:2.5}
│   ⇓ e2
│ node2={id:2,data:100}─┐
│   ⇓ e3                │
└➝node3={id:3,data:9} e7│
e6  ⇓ e4                │
  node4={id:4,data:20.1}↲
    ⇓ e5
  tail ={id:5,data:-9.0}
{
   "_GraphNodes_":[
     "head": {"id":0,"data":10.1},
     "node1":{"id":1,"data":2.5 },
     "node2":{"id":2,"data":100 },
     "node3":{"id":3,"data":9   },
     "node4":{"id":4,"data":20.1},
     "tail": {"id":5,"data":-9.0}
   ],
   "_GraphEdges_":[
     ["head", "node1","e1"],
     ["node1","node2","e2"],
     ["node2","node3","e3"],
     ["node3","node4","e4"],
     ["node4","tail", "e5"],
     ["node1","node3","e6"],
     ["node2","node4","e7"]
   ]
 }
[{]
  [U][12][_GraphNodes_] [{]
     [U][4][head] [{] [U][2][id] [l][0] [U][4][data] [d][10.1] [}]
     [U][5][node1][{] [U][2][id] [l][1] [U][4][data] [d][2.5]  [}]
     [U][5][node2][{] [U][2][id] [l][2] [U][4][data] [d][100]  [}]
     [U][5][node3][{] [U][2][id] [l][3] [U][4][data] [d][9]    [}]
     [U][5][node4][{] [U][2][id] [l][4] [U][4][data] [d][20.1] [}]
     [U][4][tail] [{] [U][2][id] [l][5] [U][4][data] [d][-9.0] [}]
  [}]
  [U][12][_GraphEdges_] [[]
     ] [S][U][4][head]  [S][U][5][node1] [S][U][2][e1] [
     ] [S][U][4][node1] [S][U][5][node2] [S][U][2][e2] [
     ] [S][U][4][node2] [S][U][5][node3] [S][U][2][e3] [
     ] [S][U][4][node3] [S][U][5][node4] [S][U][2][e4] [
     ] [S][U][4][node4] [S][U][4][tail]  [S][U][2][e5] [
     ] [S][U][4][node1] [S][U][5][node3] [S][U][2][e6] [
     ] [S][U][4][node2] [S][U][5][node4] [S][U][2][e7] [
  []]
[}]

Software ecosystem

Text-based JData files are plain JSON files and can be readily parsed by most existing JSON parsers. The JSON files that contains JData annotation tags is recommended to have a suffix of .jdt, although it can also be saved as .json. A few slight differences exist between a .jdt and a .json file, including

  1. JData .jdt file accepts multiple concatenated JSON objects inside a single file
  2. JData .jdt strings accepts new-lines inside a string while JSON specification requires new-line characters to be encoded as "\n"; most JSON parsers can process new-lines in the string via the "relaxed" parsing mode.

The binary interface of the JData specification is defined via the Binary JData (BJData) specification - a format largely derived from the UBJSON Specification Draft 12. The BJData format contains three extended features compared to UBJSON: 1) BJData introduces 4 new data markers ([u] for "uint16", [m] for "uint32", [M] for "uint64", and [h] for "float16") that were not supported in UBJSON, 2) BJData introduces an optimized typed N-D array container, and 3) BJData stops mapping NaN/Infinity to null ([Z]), instead, it uses their respective IEEE754 representations.

Lightweight Python JData encoder/decoder, pyjdata,[3] is available on PyPI, Debian/Ubuntu and GitHub. It can convert a wide range of complex data structures, including dict, array, numpy ndarray, into JData representations and export the data as JSON or UBJSON files. The BJData Python module, pybj,[4] enabling reading/writing BJData/UBJSON files, is also available on PyPI, Debian/Ubuntu and GitHub.

For MATLAB and GNU Octave, JSONLab v2.0 is the reference implementation for the latest JData specification, and is available on Debian/Ubuntu, Fedora, and GitHub. The JSONLab toolbox is also distributed via MATLAB File Exchange, and is among the most popular downloads packages, and named in Popular File 2018.

For JavaScript, a JData encoder/decoder named jsdata has been developed to process JData encoded files in web pages. A prominent application of jsdata is MCX Cloud,[5] an NIH-funded cloud-based Monte Carlo photon transport simulation platform.

Compact functions for encoding/decoding JSON files containing JData-annotations have also been implemented in C/C++ as part of the Monte Carlo eXtreme photon transport simulator.

See also

References

External links