Bush hid the facts

Short description: Bug in Microsoft Windows

"Bush hid the facts" is a common name for a bug present in Microsoft Windows which causes text encoded in ASCII to be interpreted as if it were UTF-16LE, resulting in garbled text. When the string "Bush hid the facts", without quotes, was put in a Notepad document and saved, closed, and reopened, the nonsensical sequence of the Chinese characters "畂桳栠摩琠敨映捡獴" would appear instead.^[1]

While "Bush hid the facts" is the sentence most commonly presented to induce the error, the bug can also be triggered by other strings such as "hhhh hhh hhh hhhhh",^[2] "this app can break",^[3] and even "a " or "z!".^[1]

Cause

When a text file is opened in Notepad, Windows checks if the text is encoded in UTF-16 using the Win32 charset detection function IsTextUnicode. IsTextUnicode guesses it is Unicode if the total changes to the "low byte" (the even indices starting at 0) is three times greater than the total changes to the "high byte" (the odd indices).^[1] If so, it returns true, causing the application to incorrectly interpret the text as UTF-16LE.^[4] As a result, Notepad renders the text as Chinese characters. It is commonly believed that spaces at even indices trigger the bug, this is due to space (32) being farther away from the lower-case letters (97...122) than letters are from each other.

The bug had existed since IsTextUnicode was introduced with Windows NT 3.5 in 1994, but was not discovered until early 2004,^[5] when George W. Bush was president of the US. Many text editors and tools exhibit this behavior on Windows because they use IsTextUnicode to determine the encoding of text files. In Windows Vista, Notepad was modified to use a different detection algorithm that does not exhibit the bug, but IsTextUnicode remains unchanged so any other tools that use it are still affected.^[6] Modern documentation states "These tests are not foolproof."^[7]

Workarounds

Several workarounds exist for this bug:

Add a character so the string is an odd number of bytes long.
Save the file as "UTF-8" (before 2018) or "UTF-8 with BOM" (after 2018) rather than "ANSI". This prepends a UTF-8 byte order mark which avoids the bug.^[8] UTF-8 without the byte order mark would still trigger the bug, as it is identical to the "ANSI" file.
Saving as "Unicode", which in Microsoft Windows means UTF-16LE. When loading this text IsTextUnicode should (and does) return true and the text is correct.
To retrieve the original text using Notepad, bring up the "Open a file" dialog box, select the file, select "ANSI" or "UTF-8" in the "Encoding" list box, and click Open. Under Windows 2000, Notepad lacks the "Encoding" list box. WordPad appears to load the text correctly without choosing the encoding, since it uses its own encoding detection.

References

↑ ^1.0 ^1.1 ^1.2 (in en) "Bush hid the facts" Bug EXPLAINED, 4 July 2023, https://www.youtube.com/watch?v=sPShnuBSvBg, retrieved 2024-09-04
↑ Christensen, Brett M. (November 2, 2009). "Bush Hid The Facts - Notepad Conspiracy Claim". http://www.hoax-slayer.com/bush-hid-the-facts-notepad.html.
↑ Kaplan, Michael S. (14 June 2006). "Behind 'How to break Windows Notepad'". http://blogs.msdn.com/b/michkap/archive/2006/06/14/631016.aspx.
↑ Chen, Raymond (March 24, 2004). "Some files come up strange in Notepad". Microsoft. https://devblogs.microsoft.com/oldnewthing/20040324-00/?p=40093.
↑ Cumps, David (February 27, 2004). "Notepad bug? Encoding issue?". #region .Net Blog. http://weblogs.asp.net/cumpsd/archive/2004/02/27/81098.aspx.
↑ Kaplan, Michael S. (March 25, 2008). "Bush might've still hid the facts, but he can't hide them from Vista SP1/Server 2008 Notepad!". http://archives.miloush.net/michkap/archive/2008/03/25/8334796.html.
↑ "IsTextUnicode function (winbase.h)". 13 Oct 2021. https://learn.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-istextunicode.
↑ Cantu, Marco (2018-10-04). "The Delphi Compiler and UTF-8 Encoded Source Code Files With no BOM" (in en-US). https://blogs.embarcadero.com/the-delphi-compiler-and-utf-8-encoded-source-code-files-with-no-bom/.

External links

The Notepad file encoding problem, redux – Raymond Chen
IsTextUnicode – Microsoft Docs
Censor oracle – A tool to identify strings that might trigger the bug (source code on GitHub)

0.00

(0 votes)

Original source: https://en.wikipedia.org/wiki/Bush hid the facts. Read more

[video-1] 1.0 ^1.1 ^1.2 (in en) "Bush hid the facts" Bug EXPLAINED, 4 July 2023, https://www.youtube.com/watch?v=sPShnuBSvBg, retrieved 2024-09-04

[2] Christensen, Brett M. (November 2, 2009). "Bush Hid The Facts - Notepad Conspiracy Claim". http://www.hoax-slayer.com/bush-hid-the-facts-notepad.html.

[3] Kaplan, Michael S. (14 June 2006). "Behind 'How to break Windows Notepad'". http://blogs.msdn.com/b/michkap/archive/2006/06/14/631016.aspx.

[4] Chen, Raymond (March 24, 2004). "Some files come up strange in Notepad". Microsoft. https://devblogs.microsoft.com/oldnewthing/20040324-00/?p=40093.

[5] Cumps, David (February 27, 2004). "Notepad bug? Encoding issue?". #region .Net Blog. http://weblogs.asp.net/cumpsd/archive/2004/02/27/81098.aspx.

[6] Kaplan, Michael S. (March 25, 2008). "Bush might've still hid the facts, but he can't hide them from Vista SP1/Server 2008 Notepad!". http://archives.miloush.net/michkap/archive/2008/03/25/8334796.html.

[7] "IsTextUnicode function (winbase.h)". 13 Oct 2021. https://learn.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-istextunicode.

[8] Cantu, Marco (2018-10-04). "The Delphi Compiler and UTF-8 Encoded Source Code Files With no BOM" (in en-US). https://blogs.embarcadero.com/the-delphi-compiler-and-utf-8-encoded-source-code-files-with-no-bom/.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

Anonymous

Search

Bush hid the facts

Namespaces

More

Page actions

Contents

Cause

Workarounds

References

External links

Navigation

Navigation

Resources

Help

googletranslator

Navigation

Wiki tools

Wiki tools

Anonymous

Search

Bush hid the facts

Cause

Workarounds

References

External links

Navigation

Wiki tools

Page tools

Other projects

Categories