NS320xx

From HandWiki

The NS32000, sometimes known as the 32k, is a series of microprocessors produced by National Semiconductor. The first member of the family, the 32016, came to market in 1982, making it the first 32-bit general-purpose microprocessor on the market. However, the 32016 contained a large number of bugs and often could not be run at its rated speed. These problems, and the presence of the similar Motorola 68000, led to almost no use in the market.

Several improved versions followed, including 1985's 32032 which was essentially a bug-fixed 32016 with an external 32-bit data bus possible due to improvements in chip carriers that were becoming common in the market. However, it offered only about 50% better speed than the 32016, and was outperformed by the 32-bit Motorola 68020, released a year prior. The 32532, released in 1987, outperformed the competing Motorola 68030 by almost two times, but by this time most interest in microprocessors had turned to RISC platforms and this otherwise excellent design saw almost no use as well.

National was working on further improvements in the 32732, but eventually gave up attempting to compete in the central processing unit (CPU) space. Instead, the basic 32000 architecture was combined with several support systems and relaunched as the Swordfish microcontroller. This had some success in the market before it was replaced by the CompactRISC architecture in mid-1990s.

Design concept

NS32008 microprocessor

The NS32000 series traces its history to an effort by National Semiconductor to produce a single-chip implementation of the VAX-11 architecture.[1] The VAX is well known for its highly "orthogonal" instruction set architecture (ISA), in which any instruction can be applied to any data. For instance, an ADD instruction might add the contents of two processor registers, or one register against a value in memory, two values in memory, or use the register as an offset against an address. This flexibility was considered the paragon of design in the era of complex instruction set computers (CISC).

National took DEC to court in California to ensure the legality of the design, but when DEC had the lawsuit moved to Massachusetts, DEC's home state, the lawsuit was dropped and the Series 32000 architecture was developed instead. Although the new instruction set architecture was not VAX-11 compatible, it did retain its highly "orthogonal" design philosophy.

Architecture

NS 32000 registers
31 . . . 23 . . . 15 . . . 07 . . . 00 (bit position)
General registers
R0 Register 0
R1 Register 1
R2 Register 2
R3 Register 3
R4 Register 4
R5 Register 5
R6 Register 6
R7 Register 7
Index registers
0000 0000 SP1                           Stack Pointer (user)
0000 0000 SP0                           Stack Pointer (interrupt)
0000 0000 SB                           Static Base
0000 0000 FP                           Frame Pointer
0000 0000 INTBASE                        Interrupt Base
Program counter
0000 0000 PC                           Program Counter
  MOD Module descriptor
Program Status Register
  15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00 (bit position)
  I P S U N Z F L T C PSR

The processors have 8 general-purpose 32-bit registers, plus a series of special-purpose registers:

  • Frame pointer
  • Stack pointer (one each for user and supervisor modes)
  • Static base register, for referencing global variables
  • Link base register for dynamically linked modules (object orientation)
  • Program counter
  • A typical processor status register, with a low-order user byte and a high-order system byte.

(Additional system registers not listed).

The instruction set is very much in the CISC model, with 2-operand instructions, memory-to-memory operations, flexible addressing modes, and variable-length byte-aligned instruction encoding. Addressing modes can involve up to two displacements and two memory indirections per operand as well as scaled indexing, making the longest conceivable instruction 23 bytes. The actual number of instructions is much lower than that of contemporary RISC processors.

Unlike some other processors, autoincrement of the base register is not provided; the only exception is a "top of stack" addressing modes that pop sources and push destinations. Uniquely, the size of the displacement is encoded in its most significant bits: 0, 10 and 11 preceded 7-, 14- and 30-bit signed displacements. (Although the processors are otherwise consistently little-endian, displacements in the instruction stream are stored in big-endian order).

General-purpose operands are specified using a 5-bit field. To this can be added an index byte (specifying the index register and 5-bit base address), and up to 2 variable-length displacements per operand.

32016

The first chip in the series was originally referred to as the 16032, but later renamed 32016 to emphasize its 32-bit internals. This contrasts it with its primary competitor in this space, 1979's Motorola 68000 (68k). The 68k used 32-bit instructions and registers, but its arithmetic logic unit (ALU), which controls much of the overall processing task, was only 16-bit. This meant it had to cycle 32-bit data through the ALU twice to complete an operation. In contrast, the NS32000 has a 32-bit ALU, so that 16-bit and 32-bit instructions take the same time to complete.

The 32016 first shipped in 1982 in a 46-pin DIP package. may have been the first 32-bit chip to reach mass production and sale (at least according to National's marketing). Although this post-dates the 68k by about two years, the 68k was not yet being widely used in the market and the 32016 generated significant interest. Unfortunately, the early versions were filled with bugs and could rarely be run at its rated speed. By 1984, after two years, the errata list still contained items specifying uncontrollable conditions that would result in the processor coming to a halt, forcing a reset.

National changed its design methodology to make it possible to get the part into production and a design system based on the language "Z" was co-developed with the University of Tel-Aviv, close to the "NSC" design centre in Herzliya, Israel. The "Z" language is similar to today's Verilog and VHDL, but has a Pascal-like syntax and is optimized for two-phase clock designs. However, by the times the fruit of these efforts were being felt in the design, numerous 68k machines were already on the market, notably the Apple Macintosh, and the 32016 never saw widespread use.

The 32016 has a 16-bit external data bus, a 24-bit external address bus, and a full 32-bit instruction set. It also includes a coprocessor interface, allowing coprocessors such as FPUs and MMUs to be attached as peers to the main processor. The MMU is based on demand paging Virtual Memory, which is the most unusual feature compared to the segmented memory approach used by the competition, and has become the standard for how microprocessors are designed today. The architecture supports an instruction restart mechanism on a page fault, which is much cleaner than the Motorola approach to dump the internal status on a page fault, which has to be read back, before the instruction is continued.

NS32016 microprocessor
NS32081 FPU

While often compared to the 68k's instruction set, this was rejected by NSC employees; one of the key marketing phrases of the time was "Elegance is Everything", comparing the highly orthogonal Series 32000 to the "kludge". One key difference is Motorola's use of address registers and data registers, with instructions only working on either address or data registers. The Series 32000 has general-purpose registers.

32032

NS32032 microprocessor

The 32032 was introduced in 1984. It is almost completely compatible with the 32016, but features a 32-bit data bus (although keeping the 24-bit address bus) for somewhat faster performance. There was also a 32008, a 32016 with a data bus cut down to 8-bits wide for low-cost applications. It is philosophically similar to the MC68008, and equally unpopular.

National also produced a series of related support chips like the NS32081 Floating Point Unit (FPU), NS32082 Memory Management Units (MMUs), NS32203 Direct Memory Access (DMA) and NS32202 Interrupt Controllers. With the full set plus memory chips and peripherals, it was feasible to build a 32-bit computer system capable of supporting modern multi-tasking operating systems, something that had previously been possible only on expensive minicomputers and mainframes.

32332, 32532

In 1985, National Semi introduced the NS32332, a much-improved version of the 32032. From the datasheet, the enhancements include "the addition of new dedicated addressing hardware (consisting of a high speed ALU, a barrel shifter and an address register), a very efficient increased (20 bytes) instruction prefetch queue, a new system/memory bus interface/protocol, increased efficiency slave processor protocol and finally enhancements of microcode." There was also a new NS32382 MMU, NS32381 FPU and the (very rare) NS32310 interface to a Weitek FPA. The aggregate performance boost of the NS32332 from these enhancements only made it 50 percent faster than the original NS32032, and therefore less than that of the main competitor, the MC68020.

National Semi introduced the NS32532 in early 1987. Running at 20-, 25- & 30-MHz, it was a complete redesign of the internal implementation with a five-stage pipeline, an integrated Cache/MMU and improved memory performance, making it about twice as performant as the competing MC68030 and i80386. At this stage RISC architectures were starting to make inroads, and the main competitors became the now equally dead AM29000 and MC88000, which was considered faster than the NS32532. For floating-point, the NS32532 used the existing NS32381 or the NS32580 interface to a Weitek FPA.[2] The NS32532 was the basis of one of the few fully realized "public domain" hardware projects (that is, resulting in an actual, useful machine running a real operating system, in this case Minix or NetBSD), the PC532.

The semi-mythical NS32732 (sometimes called NS32764), originally envisioned as the high-performance successor to the NS32532. This program never came to the market.

Swordfish

A derivative of the NS32732 called Swordfish was aimed at embedded systems and arrived in about 1990. Swordfish has an integrated floating point unit, timers, DMA controllers and other peripherals not normally available in microprocessors. It has a 64-bit data bus and is internally overclocked from 25 to 50 MHz. The chief architect of the Swordfish is Donald Alpert, who went on to manage the architectural team designing the Pentium. The Pentium internal microarchitecture is similar to the preceding Swordfish.

The focus of Swordfish was high-end Postscript laser printers, and performance was exceptional at the time. Competing solutions could render about one new page per minute, but the Swordfish demo unit would print out sixteen pages per minute, limited only by the laser-engine mechanics. On each page it would print out how much time it was idling, waiting for the engine to complete.

The Swordfish die is huge, and it was eventually decided to drop the project altogether, and the product never went into production. The lessons from the Swordfish were used for the CompactRISC designs. In the beginning, there were both a CompactRISC-32 and a CompactRISC-16, designed using "Z". National never brought a chip to the market with the CompactRISC-32 core. National's Research department worked with the University of Michigan to develop the first synthesizable Verilog Model, and Verilog was used from the CR16C and onwards.

Others

Versions of the older NS32000 line for low-cost products such as the NS32CG16, NS32CG160, NS32FV16, NS32FX161, NS32FX164 and the NS32AM160/1/3, all based on the NS302CG16 were introduced from 1987 and onwards. These processors had some success in the laser printer and fax market, despite intense competition from AMD and Intel RISC chips. Especially the NS32CG16 should be noted. The key difference between this and the NS32C016 is the integration of the expensive TCU (Timing Control Unit) which generates the needed two-phase clock from a crystal, and the removal of the floating point coprocessor support, which freed up microcode space for the useful BitBLT instruction set, which significantly improves the performance in laser printer operations, making this 60,000 transistor chip faster than the 200,000 transistor MC68020. The NS32CG160 is the CG16 with timers and DMA peripherals, while the NS32FV/FX16x chips have extra DSP functionality on top of the CG16 BitBLT core for the Fax/Answering Machine market. They are complemented by the NS32532 based NS32GX32 later. Unlike the previous chips, there was no extra hardware. The NS32GX32 is the NS32532 without the MMU sold at an attractive price for embedded system. In the beginning, this was just a remarked chip. It is unclear if the chip was redesigned for lower-cost production.

Datasheets exist for an NS32132, apparently designed for multiprocessor systems. This is the NS32032 extended with an arbiter. The bus usage of the NS32032 is about 50 percent, owing to its very compact instruction set, or its very slow pipeline as competitors would phrase it. The NS32132 chip allows a pair of CPUs to be connected to the same memory system, without much change of the PCB. Prototype systems were built by Diab Data AB in Sweden, but did not perform as well as the single-CPU MC68020 system designed by the same company.

Machines using the NS32000 series

  • Acorn Cambridge Workstation – NS32016 (with 6502 host)
  • Intermec (previously A-Tech and then UBI) Label Printer – NS32CG16
  • BBC Micro – NS32016 Second Processor [1] [2] [3]
  • Canon LBP-8 Mark III Laser Printer – NS32CG16
  • Whitechapel MG-1 – NS32016
  • Whitechapel MG200 – NS32332
  • Opus – NS16032 PC Add-On Board
  • Sequent Balance – NS32016, NS32032 and NS32332 multiprocessor
  • ETH Zurich Ceres workstation – NS32032
  • ETH Zurich Ceres-2 workstation – NS32532
  • ETH Zurich Ceres-3 workstation – NS32GX32
  • Heurikon VME532 – NS32532 VME Card (with cache)
  • PC532 – NS32532
  • Tolerant Systems Eternity Series – NS32032 w/ NS32016 I/O processor
  • National Semiconductor ICM-3216 – NS32016
  • National Semiconductor ICM-332-1 – NS32332 w/ NS32016 I/O processor
  • National Semiconductor SYS32/20 – NS32016 PC add-on board w/ Unix
  • Encore Multimax – NS32032, NS32332 and NS32532 Multiprocessor
  • Trinity College Workstation – NS32332
  • Tektronix 6130 & 6250 Workstation – NS32016 and NS32032
  • Siemens PC-MX2 – NS32016
  • Siemens MX300-05/-10/-15/-30 – NS32332 (−05/-10) or NS32532 (−15/-30) under SINIX (MX300-55 and later use i486)
  • Siemens MX500-75/-85 – NS32532 (2-8x CPUs; Sequent Boards / MX500-90 uses 2-12x i486)
  • Compupro 32016 – NS32016 S-100 Card
  • Symmetric Computer Systems S/375 [4] – NS32016, used to cross-develop 386BSD
  • Syte Information Technology – Unix graphics workstation
  • General Robotics Corp. Python – NS32032 & N32016 Q-Bus card
  • Teklogix 9020 network controller – NS32332
  • Teklogix 9200 network controller – NS32CG160
  • Labtam Unix System NS32032 and NS32332 CPUs
  • Lauterbach Incircuit Emulator ICE (System Controller 32-bit, first version in 1996, max 16 MB ZIP20-RAM, Z180 to serve Ethernet)
  • IBM RT PC – Some early models used the NS32081 FPU as a coprocessor for the IBM ROMP microprocessor

Legacy

In June 2015, Udo Möller released a complete Verilog implementation of an NS32000 processor on OpenCores.[3] Fully software-compatible with an NS32532 CPU with N32381 FPU, it is significantly faster when implemented on an FPGA,[4] both operating at a higher clock rate and using fewer cycles per instruction.

References

  • Trevor G. Marshall, George Scolaro and David L. Rand: The Definicon DSI-32 Coprocessor. Micro Cornucopia, Aug/September 1985,
  • Trevor G. Marshall, George Scolaro and David L. Rand: The DSI-32 Coprocessor Board. Part 1, BYTE, August 1985, pp 120–136; Part 2, BYTE, September 1985, p 116.

External links

Datasheets

This article is based on material taken from the Free On-line Dictionary of Computing prior to 1 November 2008 and incorporated under the "relicensing" terms of the GFDL, version 1.3 or later.