Tab-separated values
| Filename extension | .tsv, .tab[1] |
|---|---|
| Internet media type | text/tab-separated-values |
| Uniform Type Identifier (UTI) | public.tab-separated-values-text[2] |
| UTI conformation | public.delimited-values-text[2] |
| Developed by | University of Minnesota Internet Gopher Team Internet Assigned Numbers Authority |
| Initial release | c. June 1993 |
| Type of format | Delimiter-separated values format |
| Container for | database information organized as field separated lists |
| Standard | IANA MIME type |
Tab-separated values (TSV) is a text data format for storing tabular data where records are separated by newline and values within a record are separated by tabs.[3] The TSV format is a delimiter-separated values (DSV) and is similar to comma-separated values (CSV).
TSV is a relatively simple format and is widely supported. It is often used in data exchange to move tabular data between different computer programs that support the format. For example, a TSV file might be used to transfer information from a database to a spreadsheet.
Example
The head of the Iris flower data set can be stored as a TSV using the following plain text (note that the HTML rendering may convert tabs to spaces):
Sepal length Sepal width Petal length Petal width Species 5.1 3.5 1.4 0.2 I. setosa 4.9 3.0 1.4 0.2 I. setosa 4.7 3.2 1.3 0.2 I. setosa 4.6 3.1 1.5 0.2 I. setosa 5.0 3.6 1.4 0.2 I. setosa
The TSV plain text above corresponds to the following tabular data:
| Sepal length | Sepal width | Petal length | Petal width | Species |
|---|---|---|---|---|
| 5.1 | 3.5 | 1.4 | 0.2 | I. setosa |
| 4.9 | 3.0 | 1.4 | 0.2 | I. setosa |
| 4.7 | 3.2 | 1.3 | 0.2 | I. setosa |
| 4.6 | 3.1 | 1.5 | 0.2 | I. setosa |
| 5.0 | 3.6 | 1.4 | 0.2 | I. setosa |
Character escaping
The IANA media type standard for TSV achieves simplicity by simply disallowing tabs within fields.[4]
Since the values in the TSV format cannot contain literal tabs or newline characters, a convention is necessary for lossless conversion of text values with these characters. A common convention is to perform the following escapes:[5][6]
| escape sequence | meaning |
|---|---|
\n
|
line feed |
\t
|
tab |
\r
|
carriage return |
\\
|
backslash |
Another common convention is to use the CSV convention from RFC 4180 and enclose values containing tabs or newlines in double quotes. This can lead to ambiguities.[7][8]
Line endings
Records are typically separated by a line feed, as is typical for Unix platforms, or a carriage return and line feed, as is typical for Microsoft platforms. Some programs may expect the latter. The de-facto specification[9] specifies that records are separated by an EOL, but does not specify any specific newline.
See also
- Comma-separated values
- Delimiter collision
References
- ↑ U of Edin. Research Data Support Team. "Choose the best file formats". University of Edinburgh. § Formats we recommend. https://www.ed.ac.uk/information-services/research-support/research-data-service/after/data-repository/choosing-file-formats.
- ↑ 2.0 2.1 "tabSeparatedText". Apple Developer Documentation: Uniform Type Identifiers. Apple Inc. https://developer.apple.com/documentation/uniformtypeidentifiers/uttype/3551577-tabseparatedtext.
- ↑ "How To Use Tab Separated Value (TSV) files". International Monetary Fund. https://www.imf.org/external/help/tsv.htm.
- ↑ Lindner 1993.
- ↑ Dusek, Jason (2014-05-06). "Linear TSV: simple, line-oriented, tabular data". http://dataprotocols.org/linear-tsv/.
- ↑ Dolan, Stephen (2018-11-01). "jq Manual". https://stedolan.github.io/jq/manual/.
- ↑ Miller, Rob (2015-09-22) (in en). Text Processing with Ruby: Extract Value from the Data That Surrounds You. Pragmatic Bookshelf. pp. 94. ISBN 978-1-68050-492-7. https://books.google.com/books?id=dg9QDwAAQBAJ&pg=PT94.
- ↑ Giuseppini, Gabriele; Burnett, Mark (2005-02-10) (in en). Microsoft Log Parser Toolkit: A Complete Toolkit for Microsoft's Undocumented Log Analysis Tool. Elsevier. pp. 311. ISBN 978-0-08-048939-1. https://books.google.com/books?id=vnIXo-yUT2gC.
- ↑ "IANA: text/tab-separated-values". https://www.iana.org/assignments/media-types/text/tab-separated-values.
Sources
- "TSV — Tab-Separated Values". Library of Congress. https://www.loc.gov/preservation/digital/formats/fdd/fdd000533.shtml.
- Lindner, Paul (June 1993). "text/tab-separated-values". Assigned Media Types Registry. University of Minnesota Internet Gopher Team. https://www.iana.org/assignments/media-types/text/tab-separated-values.
- "How To Use Tab Separated Value (TSV) Files". https://www.imf.org/external/help/tsv.htm.
Further reading
- Jukka, Korpela (2000-09-01). "Tab Separated Values (TSV): a format for tabular data exchange". https://jkorpela.fi/TSV.html.
- Welinder, Morten (2012-12-19). The Gnumeric Manual (v1.12 ed.).
