Flat-file database

From HandWiki
Short description: Database stored as an ordinary unstructured file


Example of a flat file model[1]

A flat-file database is a database stored in a file called a flat file. Records follow a uniform format, and there are no structures for indexing or recognizing relationships between records. The file is simple. A flat file can be a plain text file (e.g. csv, txt or tsv), or a binary file. Relationships can be inferred from the data in the database, but the database format itself does not make those relationships explicit.

The term has generally implied a small database, but very large databases can also be flat.

Overview

Plain text files usually contain one record per line.[2] There are different conventions for depicting data. In comma-separated values and delimiter-separated values files, fields can be separated by delimiters such as comma or tab characters. In other cases, each field may have a fixed length; short values may be padded with space characters. Extra formatting may be needed to avoid delimiter collision.

Using delimiters incurs some overhead in locating them every time they are processed (unlike fixed-width formatting), which may have performance implications. However, use of character delimiters (especially commas) is also a crude form of data compression which may assist overall performance by reducing data volumes — especially for data transmission purposes. Use of character delimiters which include a length component (Declarative notation) is comparatively rare but vastly reduces the overhead associated with locating the extent of each field.

Examples of flat files include /etc/passwd and /etc/group on Unix-like operating systems. Another example of a flat file is a name-and-address list with the fields Name, Address, and Phone Number.

A list of names, addresses, and phone numbers written by hand on a sheet of paper is a flat-file database. This can also be done with any typewriter or word processor. A spreadsheet or text editor program may be used to implement a flat-file database, which may then be printed or used online for improved search capabilities.

History

Herman Hollerith's work for the US Census Bureau first exercised in the 1890 United States Census, involving data tabulated via hole punches in paper cards,[3] is sometimes considered the first computerized flat-file database, as it included no cards indexing other cards, or otherwise relating the individual cards to one another, save by their group membership.[citation needed]

In the 1980s, configurable flat-file database computer applications were popular on the IBM PC and the Macintosh. These programs were designed to make it easy for individuals to design and use their own databases, and were almost on par with word processors and spreadsheets in popularity.[citation needed] Examples of flat-file database software include early versions of FileMaker and the shareware PC-File and the popular dBase.

Flat-file databases are common and ubiquitous because they are easy to write and edit, and suit myriad purposes in an uncomplicated way.

Modern implementations

Linear stores of NoSQL data, JSON formatted data, primitive spreadsheets (perhaps comma-separated or tab-delimited), and text files can all be seen as flat-file databases, because they lack integrated indexes, built-in references between data elements, or complex data types. Programs to manage collections of books or appointments and address book may use essentially single-purpose flat-file databases, storing and retrieving information from flat files unadorned with indexes or pointing systems.

While a user can write a table of contents into a text file, the text file format itself does not include a concept of a table of contents. While a user may write "friends with Kathy" in the "Notes" section for John's contact information, this is interpreted by the user rather than a built-in feature of the database. When a database system begins to recognize and codify relationships between records, it begins to drift away from being "flat," and when it has a detailed system for describing types and hierarchical relationships, it is now too structured to be considered "flat."

Example database

The following example illustrates typical elements of a flat-file database. The data arrangement consists of a series of columns and rows organized into a tabular format. This specific example uses only one table.

The columns include: name (a person's name, second column); team (the name of an athletic team supported by the person, third column); and a numeric unique ID, (used to uniquely identify records, first column).

Here is an example textual representation of the described data:

id    name    team
1     Amy     Blues
2     Bob     Reds
3     Chuck   Blues
4     Richard Blues
5     Ethel   Reds
6     Fred    Blues
7     Gilly   Blues
8     Hank    Reds
9     Hank    Blues

This type of data representation is quite standard for a flat-file database, although there are some additional considerations that are not readily apparent from the text:

  • Data types: each column in a database table such as the one above is ordinarily restricted to a specific data type. Such restrictions are usually established by convention, but not formally indicated unless the data is transferred to a relational database system.
  • Separated columns: In the above example, individual columns are separated using whitespace characters. This is also called indentation or "fixed-width" data formatting. Another common convention is to separate columns using one or more delimiter characters, such as a tab or comma.
  • Relational algebra: Each row or record in the above table meets the standard definition of a tuple under relational algebra (the above example depicts a series of 3-tuples). Additionally, the first row specifies the field names that are associated with the values of each row.
  • Database management system: Since the formal operations possible with a text file are usually more limited than desired, the text in the above example would ordinarily represent an intermediary state of the data prior to being transferred into a database management system.

See also

  • /etc/passwd, a commonly used flat file, used to detail users in Unix
  • CSV (standard Comma-Separated Values)
  • Berkeley DB (typical flat-file database)
  • Awk (classical flat-file processor)
  • Recfiles (plain text database file format)

References

  1. Data Integration Glossary , U.S. Department of Transportation, August 2001.
  2. Fowler, Glenn (1994), "cql: Flat-file database query language", WTEC'94: Proceedings of the USENIX Winter 1994 Technical Conference on USENIX Winter 1994 Technical Conference, http://static.usenix.org/publications/library/proceedings/sf94/full_papers/fowler.a 
  3. Blodgett, John H.; Schultz, Claire K. (1969). "Herman hollerith: data processing pioneer" (in en). American Documentation 20 (3): 221–226. doi:10.1002/asi.4630200307. ISSN 1936-6108. https://onlinelibrary.wiley.com/doi/abs/10.1002/asi.4630200307. 

it:Flat file