Agnostic (data)

From HandWiki

In computing, a device or software program is said to be agnostic or data agnostic if the method or format of data transmission is irrelevant to the device or program's function. This means that the device or program can receive data in multiple formats or from multiple sources, and still process that data effectively.

Definition

Many devices or programs need data to be presented in a specific format to process the data. For example, Apple Inc devices generally require applications to be downloaded from their App Store.[1] This is a non data-agnostic method, as it uses a specified file type, downloaded from a specific location, and does not function unless those requirements are met.

Non data-agnostic devices and programs can present problems. For example, if your file contains the right type of data (such as text), but in the wrong format, you may have to create a new file and enter the text manually in the proper format in order to use that program. Various file conversion programs exist because people need to convert their files to a different format in order to use them effectively.[2][3][4][5]

Implementation

Data agnostic devices and programs work to solve these problems in a variety of ways. Devices can treat files in the same way whether they are downloaded over the internet or transferred over a USB or other cable.

Devices and programs[6] can become more data-agnostic by using a generic storage format to create, read, update, and delete files. Formats like XML and JSON can store information in a data agnostic manner. For example, XML is data agnostic in that it can save any type of information. However, if you use Data Transform Definitions (DTD) or XML Schema Definitions (XSD) to define what data should be placed where, it becomes non-data agnostic; it produces an error if the wrong type of data is placed in a field.

Once you have your data saved in a generic storage format, this source can act as an entity synchronization layer. The generic storage format can interface with a variety of different programs, with the data extraction method formatting the data in a way that the specific program can understand. This allows two programs that require different data formats to access the same data. Multiple devices and programs can create, read, update, and delete (CRUD) the same information from the same storage location without formatting errors.

When multiple programs are accessing the same records, they may have different defined fields for the same type of concept. Where the fields are differently labelled but contain the same data, the program pulling the information can ensure the correct data is used. If one program contains fields and information that another does not, those fields can be saved to the record and pulled for that program, but ignored by other programs. As the entity synchronization layer is data agnostic, additional fields can be added without worrying about recoding the whole database, and concepts created in other programs (that do not contain that field) are fine.

Since the information formatting is imposed on the data by the program extracting it, the format can be customized to the device or program extracting and displaying that data. The information extracted from the entity synchronization layer can therefore be dynamically rendered to display on the user's device, regardless of the device or program being used.

Having data agnostic devices and programs allows you to transfer data easily between them, without having to convert that data. Companies like Great Ideaz[7] provide data agnostic services by storing the data in an entity synchronization layer. This acts as a compatibility layer, as TSQL statements can retrieve, update, sort, and write data regardless of the format employed. It also allows you to synchronize data between multiple applications, as the applications can all pull data from the same location. This prevents compatibility problems between different programs that have to access the same data, as well as reducing data replication.

Benefits

Keeping your devices and programs as data agnostic as possible has some clear advantages. Since the data is stored in an agnostic format, developers do not need to hard-code ways to deal with all different kinds of data. A table with information about dogs and one with information about cats can be treated in the same way; extract the field definitions and the field content from the data agnostic storage format and display it based on the field definitions. Using the same code for the different concepts to CRUD, the amount of code is significantly reduced, and what remains is tested with each concept you extract from the entity synchronization layer.

The field definitions and formatting can be stored in the entity synchronization layer with the data they are acting on. Allows fields and formatting to change, without having to hardcode and compile programs. The data and formatting are then generated dynamically by the code used to extract the data and the formatting information.

The data itself only needs to be distinguished when it is being acted on or displayed in a specific way. If the data is being transferred between devices or databases, it does not need to be interpreted as a specific object. Whenever the data can be treated as agnostic, the coding is simplified, as it only has to deal with one case (the data agnostic case) rather than multiple (png, pdf, etc.). When the data must be displayed or acted on, then it is interpreted based on the field definitions and formatting information, and returned to a data agnostic format as soon as possible to reduce the number of individual cases that must be accounted for.

Risks

There are, however, a few problems introduced when attempting to make a device or program data agnostic. Since only one piece of code is being used for CRUD operations (regardless of the type of concept), there is a single point of failure. If that code breaks down, the whole system is broken. This risk is mitigated because the code is tested so many times (as it is used every time a record is stored or retrieved).

Additionally, data agnostic storage mediums can increase load speed, as the code has to search for the field definitions and display format as well as the specific data to be displayed. The load speed can be improved by pre-shredding the data. This uses a copy of the record with the data already extracted to index the fields, instead of having to extract the fields and formatting information at the same time as the data. While this improves the speed, it adds a non-data agnostic element to the process; however, it can be created easily through code generation.

References