8-bit clean

From HandWiki
Short description: Computer system that correctly handles 8-bit character encodings


8-bit clean is an attribute of computer systems, communication channels, and other devices and software, that process 8-bit character encodings without treating any byte as an in-band control code.

History

Until the early 1990s, many programs and data transmission channels were character-oriented and treated some characters, e.g., ETX, as control characters. Other assumed a stream of seven-bit characters, with values between 0 and 127; for example, the ASCII standard used only seven bits per character, avoiding an 8-bit representation in order to save on data transmission costs. On computers and data links using 8-bit bytes this left the top bit of each byte free for use as a parity, flag bit, or meta data control bit. 7-bit systems and data links are unable to directly handle more complex character codes which are commonplace in non-English-speaking countries with larger alphabets.

Binary files of octets cannot be transmitted through 7-bit data channels directly. To work around this, binary-to-text encodings have been devised which use only 7-bit ASCII characters. Some of these encodings are uuencoding, Ascii85, SREC, BinHex, kermit and MIME's Base64. EBCDIC-based systems cannot handle all characters used in UUencoded data. However, the base64 encoding does not have this problem.

SMTP and NNTP 8-bit cleanness

Historically, various media were used to transfer messages, some of them only supporting 7-bit data, so an 8-bit message had high chances to be garbled during transmission in the 20th century. But some implementations really did not care about formal discouraging of 8-bit data and allowed high bit set bytes to pass through. Such implementations are said to be 8-bit clean. In general, a communications protocol is said to be 8-bit clean if it correctly passes through the high bit of each byte in the communication process.

Many early communications protocol standards, such as RFC 780, 788, 821, 2821, 5321 (for SMTP), RFC 977 (for NNTP) and RFC 1056, were designed to work over such "7-bit" communication links. They specifically require the use of ASCII character set "transmitted as an 8-bit byte with the high-order bit cleared to zero" and some of these[1] explicitly restrict all data to 7-bit characters.

For the first few decades of email networks (1971 to the early 1990s), most email messages were plain text in the 7-bit US-ASCII character set.[2]

The RFC 788 definition of SMTP, like its predecessor RFC 780, limits Internet Mail to lines (1000 characters or less) of 7-bit US-ASCII characters.[3][4][5][6]

Later the format of email messages was re-defined in order to support messages that are not entirely US-ASCII text (text messages in character sets other than US-ASCII, and non-text messages, such as audio and images).[6] The header field Content-Transfer-Encoding=binary[lower-alpha 1] requires an 8-bit clean transport.

RFC 3977[7] specifies "NNTP operates over any reliable bi-directional 8-bit-wide data stream channel." and changes the character set for commands to UTF-8. However, RFC 5536[8] still limits the character set to ASCII, including RFC 2047[9] and RFC 2231[10] MIME encoding of non-ASCII data.

The Internet community generally adds features by extension, allowing communication in both directions between upgraded machines and not-yet-upgraded machines, rather than declaring formerly standards-compliant legacy software to be "broken" and insisting that all software worldwide be upgraded to the latest standard. In the mid-1990s, people[who?] objected to "just send 8 bits (to RFC 821 SMTP servers)", perhaps because of a perception that "just send 8 bits" is an implicit declaration that ISO 8859-1 become the new "standard encoding", forcing everyone in the world to use the same character set.[original research?] Instead, the recommended way to take advantage of 8-bit-clean links between machines is to use the ESMTP (RFC 1869) 8BITMIME extension[11][12] for message bodies and the SMTP SMTPUTF8[13] extension for message headers. Despite this, some mail transfer agents, notably Exim and qmail, relay mail to servers that do not advertise 8BITMIME without performing the conversion to 7-bit MIME (typically quoted-printable, "Q-P conversion") required by RFC 6152. This "just-send-8" attitude does not in fact cause problems in practice, since virtually all modern email servers are 8-bit clean.[14]

See also

Notes

  1. The header field Content-Transfer-Encoding=8BIT does not designate 8-bit clean, since CRLF has special significance.

References

  1. RFC 780: Appendix A, RFC 788: 4.5.2., RFC 821: Appendix B, RFC 1056: 4.
  2. John Beck. "Email Explained". 2011.
  3. Template:Cite RFC
  4. Template:Cite RFC
  5. Dan Sugalski. "E-mail with Attachments". "The Perl Journal". Summer 1999. "When mail was standardized way back in 1982 with RFC822, ... The only limits placed on the body were the character set (7-bit ASCII) and the maximum line length (1000 characters)."
  6. 6.0 6.1 Template:Cite RFC
  7. C. Feather (October 2006), Network News Transfer Protocol (NNTP), doi:10.17487/RFC3977, RFC 3977, https://tools.ietf.org/html/rfc3977 
  8. C. Lindsey; D. Kohn (November 2009), K. Murchison, ed., Netnews Article Format, doi:10.17487/RFC5536, RFC 5536, https://tools.ietf.org/html/rfc5536 
  9. Template:Cite RFC
  10. N. Freed; K. Moore (November 1997), MIME Parameter Value and Encoded Word Extensions: Character Sets, Languages, and Continuations, doi:10.17487/RFC2231, RFC 2231, https://tools.ietf.org/html/rfc2231 
  11. Theodore Ts'o; Keith Moore (12 September 1994). "8-bit transmission in NNTP". IETF-SMTP mail list. http://www.imc.org/ietf-smtp/old-archive/msg02018.html. 
  12. "comp.mail.mime FAQ, part 3 'What's ESMTP, and how does it affect MIME?'". Usenet FAQs. 8 August 1997. http://www.uni-giessen.de/faq/archiv/mail.mime-faq.part1-9/msg00002.html. 
  13. J. Yao; W. Mao (February 2012), SMTP Extension for Internationalized Email, doi:10.17487/RFC8531, RFC 8531, https://tools.ietf.org/html/rfc8531 
  14. "The 8BITMIME extension". http://cr.yp.to/smtp/8bitmime.html.