SQUOZE

From HandWiki
Short description: Compression scheme

SQUOZE (abbreviated as SQZ) is a memory-efficient representation of a combined source and relocatable object program file with a symbol table on punched cards which was introduced in 1958 with the SCAT assembler[1][2] on the SHARE Operating System (SOS) for the IBM 709.[3][4] A program in this format was called a SQUOZE deck.[5][6][7] It was also used on later machines including the IBM 7090 and 7094.

Encoding

In the SQUOZE encoding, identifiers in the symbol table were represented in a 50-character alphabet, allowing a 36-bit machine word to represent six alphanumeric characters plus two flag bits, thus saving two bits per six characters,[6][1] because the six bits normally allocated for each character could store up to 64 states rather than only the 50 states needed to represent the 50 letters of the alphabet, and 506 < 234.

SQUOZE character codes[1]
Most
significant
digits
Least significant digits
Dec +0 +1 +2 +3 +4 +5 +6 +7
Oct 0 1 2 3 4 5 6 7
Dec Oct Bin 000 001 010 011 100 101 110 111
+0 0 000 space 0 1 2 3 4 5 6
+8 1 001 7 8 9 A B C D E
+16 2 010 F G H I J K L M
+24 3 011 N O P Q R S T U
+32 4 100 V W X Y Z = # / % ) ⌑
+40 5 101 + & - - @ + & - * / $
+48 6 110 , . N/A N/A N/A N/A N/A N/A

Using base 50 already saves a single bit every three characters, so it was used in two three-character chunks. The manual[1] has a formula for encoding six characters ABCDEF: [math]\displaystyle{ (A*50^2 + B*50 + C) * 2^{17} + (D*50^2 + E*50 + F) }[/math]

For example "SQUOZE", normally 36 bits: 35 33 37 31 44 17(base 8) would be encoded in two 17-bit pieces to fit in the 34 bits as ( 0o220231 << 17 ) | 0o175473 == 0o110114575473.

A simpler example of the same logic would be how a three-digit BCD number would take up 12 bits, such as 987: 9 8 7(base 16) 1001 1000 0111(base 2), but any such value could be stored in 10 bits directly, saving two bits, such as 987: 3db(base 16) 11 1101 1011(base 2).

Etymology

"Squoze" is a facetious past participle of the verb 'to squeeze'.[5][6]

The name SQUOZE was later borrowed for similar schemes used on DEC machines;[4] they had a 40-character alphabet (50 in octal) and were called DEC RADIX 50 and MOD40,[8] but sometimes nicknamed DEC Squoze.

See also

References

  1. 1.0 1.1 1.2 1.3 ((SHARE 709 System Committee)), ed (June 1961). "Section 02: SCAT Language; Appendix 1: Table of Permissible Characters; Appendix 3: SQUOZE Deck Format - Chapter 8: Dictionary". SOS Reference Manual - SHARE System for the IBM 709. New York, USA: SOS Group, International Business Machines Corporation. pp. 02.00.01 – 02.00.11, 12.03.08.01 – 12.03.08.02, 12.01.00.01. X28-1213. Distribution No. 1–5. http://bitsavers.org/pdf/ibm/share/SOS_Reference_Manual_Jun61.pdf. Retrieved 2020-06-18. "[…] Bit Positions Used […] Bit 0 […] Bit 1 […] Bits 2–35 […] Base 50 representation of the symbol with heading character. […] The base 50 representation of a symbol is obtained as follows: […] a. If the symbol has fewer than five characters, it is headed (by blank if it is in an unheaded region). […] b. The symbol with it[s] heading character is left-justified and any unused low-order positions are filled with blanks. […] c. Each character in the symbol is replaced by it[s] base 50 equivalent. […] d. The result is then converted by the following: if the symbol, after each character is rep[l]aced by its base 50 equivalent, is ABCDEF, its base 50 representation is (A*502+B*50+C)*217+(D*502+E*50+F). […]"  [1][2]
  2. Chivers, Ian D., ed (February 1993). written at California State University, Northridge, California, USA. Assemblers and Loaders. Ellis Horwood Series In Computers And Their Applications (1 ed.). Chicester, West Sussex, UK: Ellis Horwood Limited / Simon & Schuster International Group. ISBN 0-13-052564-2. http://www.davidsalomon.name/assem.advertis/asl.pdf. Retrieved 2008-10-01.  (xiv+294+4 pages)
  3. "Part I Chapter 3.1.3 On-Line Locality Optimizations: Dynamic Compression of Instructions and Data". Memory Systems: Cache, DRAM, Disk. The Morgan Kaufmann Series in Computer Architecture and Design. Morgan Kaufmann Publishers / Elsevier. 2008. p. 147. ISBN 978-0-12-379751-3. https://books.google.com/books?id=SrP3aWed-esC&pg=PA147.  (900 pages)
  4. 4.0 4.1 "Lecture 7, Object Codes, Loaders and Linkers - Final steps on the road to machine code". Operating Systems, Spring 2018. The University of Iowa, Department of Computer Science. 2018. http://homepage.divms.uiowa.edu/~jones/opsys/notes/07.shtml. 
  5. 5.0 5.1 "Machine Implementation of Symbolic Programming - Summary of a Paper to be Presented at the Summer 1958 Meeting of the ACM". ACM '58: Preprints of papers presented at the 13th national meeting of the Association for Computing Machinery. June 1958. pp. 17-1 – 17-3. doi:10.1145/610937.610953. https://dl.acm.org/doi/pdf/10.1145/610937.610953. Retrieved 2020-06-06.  (3 pages)
  6. 6.0 6.1 6.2 "The SHARE 709 System: Machine Implementation of Symbolic Programming". Journal of the ACM 6 (2): 134–140. April 1959. doi:10.1145/320964.320968. https://dl.acm.org/doi/pdf/10.1145/320964.320968. Retrieved 2020-06-04. "[…] There is an interesting feature related to the encoding of symbols for inclusion in the dictionary. In the usual mode of expression, symbols may be constructed from a set of 50 characters. If encoding were character by character, six bits would be required for the representation of each such character. As a symbol may contain as many as six characters, a total of 36 bits would be required for the representation of each symbol. This might seem convenient, as the length of a 709 word is exactly 36 bits, but a moment's consideration shows that it is unfortunate as it would be desirable to have a bit or two available in the same word as the symbol representation, giving a clue to the nature of the symbol. These flagging bits can be obtained. Let each character possible represent a digit in a number system having a base of fifty. Now six character symbols may be read as natural numbers in a base fifty system. If these numbers are converted to the usual base two system, only 34 bits are required for the maximum number and a gain of two flag bits has been made. This has the incidental feature of decreasing the requisite number of bits for representing the entire code, but conversion time would outweigh the saving by a significant margin were it not for the peculiar length of the 709 word. Here is a clear illustration of the critical effect the precise specifications of the machine concerned hold over the details of an encoding schema. […]".  (7 pages)
  7. "The SHARE 709 System: A Cooperative Effort". Journal of the ACM 6 (2): 123–127. April 1959. doi:10.1145/320964.320966. https://dl.acm.org/doi/pdf/10.1145/320964.320966. Retrieved 2020-06-16.  (5 pages)
  8. "8.10 .RAD50". PAL-11R Assembler - Programmer's Manual - Program Assembly Language and Relocatable Assembler for the Disk Operating System (2nd revised printing ed.). Maynard, Massachusetts, USA: Digital Equipment Corporation. May 1971. p. 8-8. DEC-11-ASDB-D. https://archive.org/details/bitsavers_decpdp11do11RAssemblerProgrammersManualMay71_2572677. Retrieved 2020-06-18. "[…] PDP-11 systems programs often handle symbols in a specially coded form called RADIX 50 (this form is sometimes referred to as MOD40). This form allows 3 characters to be packed into 16 bits […]"  [3]

Further reading

  • "The PORTHOS Executive System for the IBM 7094 - User's Manual". University of Illinois, Graduate College Digital Computer Laboratory. 1964-04-15. https://core.ac.uk/download/pdf/4834584.pdf. "[…] SCAT is a two part assembler which in brief operates as follows: Programs written symbolically as one order per card are ingested during the first phase by the "compiler" which scans the program for symbols and outputs a condensed deck of cards (SQUOZE deck) containing tables of these symbols and the program condensed and efficiently coded. During the second phase this SQUOZE deck is ingested by the "modify and load" program which converts the object program to binary machine language which by option can either be loaded ready to run or output on absolute binary cards (23 orders per card) for loading and running at a later time. The "lister" can produce a printed version of the program at either of these stages. Symbolic corrections to a program can be inserted into the second phase along with the SQUOZE deck. […]"  (1 page)