InfinityDB

From HandWiki

InfinityDB is an all-Java embedded database engine and client/server DBMS with an extended java.util.concurrent.ConcurrentNavigableMap interface (a subinterface of java.util.Map) that is deployed in handheld devices, on servers, on workstations, and in distributed settings. The design is based on a proprietary lockless, concurrent, B-tree architecture that enables client programmers to reach high levels of performance without risk of failures.[1]

A new Client/Server version 5.0 is in alpha testing, wrapping the established embedded version to provide shared access via a secure, remote server.

In the embedded system, data is stored to and retrieved from a single embedded database file using the InfnityDB API that allows direct access to the variable length item spaces. Database client programmers can construct traditional relations as well as specialized models that directly satisfy the needs of the dependent application. There is no limit to the number of items, database size, or JVM size, so InfinityDB can function in both the smallest environment that provides random access storage and can be scaled to large settings. Traditional relations and specialized models can be directed to the same database file. InfinityDB can be optimized for standard relations as well as all other types of data, allowing client applications to perform at a minimum of one million operations per second on a virtual, 8-core system.

AirConcurrentMap, is an in-memory map that implements the Java ConcurrentMap interface,[2] but internally it uses a multi-core design so that its performance and memory make it the fastest Java Map when ordering is performed and it holds medium to large numbers of entries.[3] AirConcurrentMap iteration is faster than any Java Map iterators, regardless of the specific map type.

Map API

InfinityDB can be accessed as an extended standard java.util.concurrent.ConcurrentNavigableMap, or via a low-level 'ItemSpace' API. The ConcurrentNavigableMap interface is a subinterface of java.util.Map but has special ordering and concurrency methods: this is the same API as java.util.concurrent.ConcurrentSkipListMap. Maps may be nested to form complex structures. The Maps have the standard semantics, but work internally on a 'tuple space', while the Maps are not actually stored but are helpers, each representing nothing more than an immutable tuple prefix. Maps may be created dynamically at high speed if needed for access, and are thread-safe and multi-core concurrent. The key and value types available include all Java primitive data types, Dates, Strings, small char or byte arrays, 'ByteStrings', 'huge array' indexes, Character Long Objects or Binary Long Objects, plus the special-purpose types 'EntityClass' and 'Attribute'. Maps may be multi-valued. Applications may choose to use the Map-based access alone and may mix in lower-level 'ItemSpace' access over the same tuples, as the Map access is just a wrapper and there is no tuple-level distinction.

The lower level 'ItemSpace' data model

The 12 primitive data types are called 'components' and they are atomic. Components can be concatenated into short composites called 'Items' which are the unit of storage and retrieval. Higher-level structures that combine these Items are client-devised, and include for example unlimited size records of an unlimited number of columns or attributes, with complex attribute values of unlimited size. Keys may then be a composition of components. Attribute values can be ordered sets of composite components, character large objects (CLOB's), binary large objects (BLOB's), or unlimited sparse arrays. Other higher-level structures built of multiple Items include key/value associations like ordered maps, ordered sets, Entity-Attribute-Value nets of quadruples, trees, DAG's, taxonomies, or full-text indexes. Mixtures of these can occur along with other custom client-defined structures.

Any ItemSpace may be represented as an extended JSON document, and JSON printers and parsers are provided. JSON documents are not native but are mapped to sets of Items when desired, at any scale determined by an Item prefix that represents the path to the sub-document. Hence, the entire database or any subtree of it down to a single value can be represented as extended JSON. Because Items are always kept sorted, the JSON keys of an object are always in order.

Data encoding

An 'ItemSpace' represents the entire database, and it is a simple ordered set of Items, with no other state. An Item is actually stored with each component encoded in variable-length binary form in a char array, with components being self-describing in a standard format which sorts correctly. Programmers deal with the components only as primitives, and the stored data is strongly typed. Data is not stored as text to be parsed with weak typing as in JSON or XML, nor is it parsed out of programmer-defined binary stream representations. There are no custom client-devised binary formats that can grow brittle, and which can have security, documentation, upgrade, testing, versioning, scaling, and debugging problems, such as is the case with Java Object serialization.

Performance scaling

All access to the system is via a few basic methods that can store or retrieve in order one variable-length 'Item' or 'tuple' at a time at a speed which is on the order of 1M operations/second aggregated over multiple threads when in memory. The operations are either the standard Map API for get(), put(), iterators, and so on, or at the lower level, insert(), delete(), update(), first(), next(), last(), and previous(). Typical Items are about 30 bytes uncompressed in memory, but LOB's for example use 1 KB Items. Because each operation affects only one Item, small data structures are fast to access. This is in contrast to chunked access, such as for example formatting and parsing entire JSON or XML texts or entire Java Object serialization graphs. The space and performance scaling of an ItemSpace is smooth as any size of client-imposed multi-Item structure is created, grows, shrinks, or disappears. On-storage performance is like any block-oriented B-tree, with blocks of about 4 KB, which is O(log(n)) per access. There is a block cache of 2.5 MB by default, which is of unlimited size but which is often about 100 MB. The cache grows only as needed.

Space Scaling

For performance and efficiency, the Items are stored inside a single B-tree prefix-compressed and variable length as an uninterpreted sequence of bytes for further compression. The B-tree may typically grow to the 100 GB's range but has no limits. There is only one file, so there is no log or other files to write to and flush. InfinityDB minimizes the size of its database file through four types of compression (prefix, suffix, zlib, and UTF-8).

Schemaless upgrade

Schema upgrade when structures are extended or modified is done by adding or removing Items in new ways at runtime, and there are no upgrade scripts - hence the data model is NoSQL and schemaless.

Besides the normal Java primitive types, there are 'EntityClass' and 'Attribute' types, each identified by a name or number. These are optional 'metadata' that can be mixed in with the other components of any Item. They can be used to represent tables, for example, where each table is given a particular EntityClass near the front of the Item, and each column is given a different Attribute. The Items of the table each start with a particular EntityClass, then therehe are one or more normal primitives representing an 'entity' (like a key) then there is a particular Attribute corresponding to a column, and finally some normal primitives representing the value of that Attribute. This simple pattern can be extended at any time to allow nested tables inside any Attribute, or nested Attributes inside other Attributes, or multi-valued Attributes and much more. There is no fixed schema elsewhere, so new data that arrives in the system describes itself, at Item-level granularity. The EntityClass and Attribute numbers or names can be represented in extended JSON. When data is displayed in the web-based Client/Server database browser, it can be viewed, manipulated, and transferred as a list of sorted formatted Items, or as JSON documents, or as nestable tables whose visible structure is determined by the EntityClass and Attributes that are mixed into the Items. The dynamic loose flexibility of JSON and the formality of tables are combined.

Transactionality

Both global 'ACD' and per-thread 'ACID' transactions are provided. Each InfinityDB instance stores data into a single database file and does not require additional log or rollback files. In the event of any catastrophe except power failure or other hardware malfunction, the database is guaranteed to be consistent with the status as of completion of the last global commit. Recovery after abrupt termination is immediate and requires no slow log replay. Bulk loading is globally transactional with unlimited data size, and is concurrent with all other uses. Global transactions do not provide inter-thread isolation, so the semantics are 'ACD' (rather than 'ACID'). Alternatively, ACID transactions employ optimistic locking to allow inter-thread isolation.

Immediate garbage collection

As data structures grow and shrink, freed space is reclaimed immediately and made available for other structures. Systems can run indefinitely without gradual space leaks or long interruptions during garbage reclamation phases. When a data structure becomes empty, all of its space is recycled, rather than leaving behind 'tombstones' or other place holders. For example, a possibly very large multi-value attribute may shrink to one value, becoming as efficient as any single-valued attribute, and if that last value is deleted, all space for it is reclaimed, including the space for the attribute it was attached to, and if a row has only attributes with no values, the row is reclaimed completely as well. If a table loses all of its rows, the space for the table is reclaimed. Any size or type of data structure has this property. There are no reference counters, hence any type of graph is incrementally collected automatically.

Products

InfinityDB Client/Server (in alpha-testing state) features:

  • A client/server system for secure remote shared access to a set of InfinityDB Embedded database files.
  • Back-end web management console for secure management of users, roles, databases, and permissions.
  • Back-end web secure database browsing and editing with tabular, JSON, and ItemSpace views. Tabular mode displays data as nestable documents, tables, and lists with concurrent editing and update at paragraph or data Item granularity.
  • Secure RESTful access by Python and bash via curl for JSON and BLOB data.
  • Remote access by Java programs using a RemoteItemSpace feature.
  • Pattern Queries for non-SQL arbitrary restructuring and querying of ItemSpace data structures, including the ItemSpace equivalent of the Relational DBMS, select, project, join, and order-by.
  • ItemSuffix Transfer provides data mobility within or between databases with copy, move, difference, union, and intersection.

InfinityDB Encrypted (Version 5) (in beta-testing state) features:

  • Encryption with AES-128 or AES-256 at the database block level
  • Authentication with HMAC-SHA256 at the block level
  • Fast hashing of encrypted blocks
  • Hashing of unencrypted blocks for HMAC authenticity checking
  • Signing with multiple certificates or public keys
  • Incremental signing - each opening of the db allows adding more signatures
  • Certificate storage and organization in the metadata in the single db file
  • Custom client signature verification strategies - 'N of M', any validated cert, more
  • Certificate validation

InfinityDB Embedded (version 4) features:

  • NoSQL model - it is a sorted hierarchical key/value store called an 'ItemSpace' for simplicity, yet generality
  • 1M ops/second, good multi-core scalability
  • Compression up to 10x or more
  • Transactions
  • Instant installation, zero administration: the entire db is in one file
  • JSON printing/parsing with extensions for more data types: JSON can represent any ItemSpace data.
  • Robust file update pattern prevents corruptions
  • Instant recovery after abrupt application exit with no log
  • 12 primitive data types
  • BLOBs/CLOBs, i.e Binary Long Objects and Character Long Objects
  • 'EntityClass' and 'Attribute' metadata primitive data types for flexible, runtime-extensible structures

AirConcurrentMap is a Java ConcurrentNavigableMap implementation. It features:

  • Faster than JDK Maps for medium to large size. This is patent-pending.
  • Better memory efficiency than all standard Java library Maps above about 1K Entries.
  • A proprietary parallel Map scan is faster than that for Java 1.8.
  • forEach is faster than for Java 1.8 Maps.

For both InfinityDB and AirConcurrentMap:

  • Concurrent, multi-threaded processing on multiple cores without locks increases performance on multi-core platforms, such as the Intel i7 by approximately seven times. Both products are patent pending.
  • Standard Java Map access is used. An enhanced java.util.concurrent.ConcurrentNavigable interface is implemented, allowing direct substitution into any existing application or test code. This interface provides specialized concurrency methods as well as ordering features improving the original SortedSet.

History

Roger L. Deran designed and developed the Infinity Database Engine over 20 years ago and holds US Patents 5283894[4] and 10417209.[5] The Infinity Database Engine was first deployed in Intel 8088 assembly language in the ROSCOR sports video editor (RSVE) that was licensed to NFL teams in the 1980s. Lexicon purchased the RSVE in 1989, and greatly expanded its deployment to all types of professional and college sports.[6] Java version 2.0 added transactionality, and version 3.0 added concurrency features which are patent pending, and apply to InfinityDB as well as AirConcurrentMap. Infinity DB remains in active use in thousands of sites of various kinds, while AirConcurrentMap is new.

Uses of the all-JAVA InfinityDB, marketed by Boiler Bay Inc. since 2002, include:

  • consolidation of pharmaceutical and medical data
  • collection, description, consolidation, and sharing of ornithological data
  • representation of taxonomies of various types
  • programming environment tools such as source code repository navigation
  • text indexers
  • email consolidation systems
  • distributed industrial data collection systems.

References

See also