The evolution of x86 processors began with the Intel 8086 microprocessor introduced in 1978. It is worth noticing that the first IBM machines, the IBM PC and IBM XT, used the cheaper version 8088 with an 8-bit data bus. The 8088 was software compatible with 8086, but because of the use of an 8-bit external data bus, it allowed for a reduction in the cost of the whole computer. IBM later used the 8086 processors in personal computers named PS/2. Its successor, 80286, was used in the IBM AT personal computer, the extended version of the XT, and in the IBM PS/2 286 machine, which became the standard architecture for a variety of personal computers designed by many vendors. This started the history of personal computers based on the x86 family of processors. In this section, we will briefly describe the most important features and differences between models of x86 processors.
The 8086 is a 16-bit processor, which means it uses 16-bit registers. With the use of the segmentation operating in so-called real addressing mode and a 20-bit address bus, it can access up to 1MB of memory. The base clocking frequency of this model is 5 - 10 MHz. It implements a three-stage instruction pipeline (loosely pipelined) which allows the execution of up to three instructions at the same time. In parallel to the processor, Intel designed a whole set of supporting integrated circuits, which made it possible to build the computer. One of these chips is 8087 - a match coprocessor known now as the Floating Point Unit.
This model includes additional hardware units and is designed to reduce the number of integrated circuits required to build the computer. Its clock generator operates at 6 - 20 MHz. 80186 implements a few additional instructions. It's considered the faster version of 8086.
The 80286 has an extended 24-bit address bus and theoretically can use 16MB of memory. In this model, Intel tried to introduce protected memory management with a theoretical address space of 1 GB. Due to problems with compatibility and efficiency of execution of the software written for 8086, extended features were used by niche operating systems only, with IBM OS/2 as the most recognised.
It is the first 32-bit processor designed by Intel for personal computers. It implements protected and virtual memory management that was successfully used in operating systems, including Microsoft Windows. Architectural elements of the processor (registers, buses) are extended to 32 bits, which allows addressing up to 4GB of memory. 80386 extends the instruction pipeline with three stages, allowing for the execution of up to 6 instructions simultaneously. Although no internal cache was implemented, this chip enables connecting to external cache memory. Intel uses the IA-32 name for this architecture and instruction set. The original name 80386 was later replaced with i386.
It is an improved version of the i386 processor. Intel combined in one chip the main CPU and FPU (except i486SX version) and memory controller, including 8 or 16 kB of cache memory. The cache is 4-way associative, common for instructions and data. It is a tightly pipelined processor which implements a five-stage instruction pipeline where every stage operates in one clock cycle. The clock frequency is between 16 - 100 MHz.
Pentium is a successor of the i486 processor. It is still a 32-bit processor but implements a dual integer pipeline, which makes it the first superscalar processor in the x86 family. I can operate with the clock frequency ranging from 60 to 300 MHz. An improved microarchitecture also includes a separate cache for instructions and data, and a branch prediction unit, which helps to reduce the influence of pipeline invalidation for conditional jump instructions. The cache memory is 8kB for instructions and 8kB for data, both 2-way associative.
Pentium MMX (MultiMedia Extension) is the first processor which implements the SIMD instructions. It uses FPU physical registers for 64-bit MMX vector operations. The clock speed is 120 - 233 MHz. Intel also decided to improve the cache memory as compared to the Pentium. Both instruction and data cache are twice as big (16kB), and they are 4-way associative.
Pentium Pro implements a new architecture (P6) with many innovative units, organised in a 14-stage pipeline, which enhances the overall performance. The advanced instruction decoder generates micro-operations, RISC-like translations of the x86 instructions. It can produce up to two micro-operations representing simple x86 instructions and up to six micro-operations from the microcode sequencer, which stores microcodes for complex x86 instructions. Micro-operations are stored in the buffer, called the reorder buffer or instruction pool, and by the reservation station are assigned to a chosen execution unit. In Pentium Pro, there are six execution units. An important feature of P6 architecture is that instructions can be executed in an out-of-order manner with flexibility of the physical register use known as register renaming. All these techniques allow for executing more than one instruction per clock cycle. Additionally, Pentium Pro implements a new, extended paging unit, which allows addressing 64GB of memory. The instruction and data L1 cache have 8kB in size each. The physical chip also includes the L2 cache assembled as a separate silicon die. This made the processor too expensive for the consumer market, so it was mainly implemented in servers and supercomputers.
Pentium II is the processor based on experience gathered by Intel in the development of the previous Pentium Pro processor and MMX extension to Pentium. Pentium II combines P6 architecture with SIMD instructions operating at a maximum of 450 MHz. The L1 cache size is increased to 32 KB (16 KB data + 16 KB instructions). Intel decided to exclude the L2 cache from the processor's enclosure and assemble it as a separate chip on a single PCB board. As a result, Pentium II has a form of PCB module, not an integrated circuit as previous models. Although offering slightly worse performance, this approach made it much cheaper than Pentium Pro, and the implementation of multimedia instructions made it more attractive for the consumer computer market.
Pentium III is very similar to Pentium II. The main enhancement is the addition of the Streaming SIMD Extensions (SSE) instruction set to accelerate SIMD floating point calculations. Due to the enhancement of the production process, it was also possible to increase the clocking frequency to the range of 400 MHz to 1.4 GHz.
Pentium 4 is the last 32-bit processor developed by Intel. Some late models also implement 64-bit enhancement. It is based on NetBurst architecture, which was developed as an improvement to P6 architecture. The important modification is a movement of the instruction cache from the input to the output of the instruction decoder. As a result, the cache, named trace cache, stores micro-operations instead of instructions. To increase the market impact, Intel decided to enlarge the number of pipeline stages, using the term “hyperpipelining” to describe the strategy of creating a very deep pipeline. A deep pipeline could lead to higher clock speeds, and Intel used it to build the marketing strategy. The Pentium 4's pipeline in the initial model is significantly deeper than that of its predecessors, having 20 stages. The Pentium 4 Prescott processor even has a pipeline of 31 stages. Operating frequency ranges from 1.3 GHz to 3.8 GHz. Intel also implemented the Hyper Threading technology in the Pentium 4 HT version to enable two virtual (logical) cores in one physical processor, which share the workload between them when possible. With Pentium 4, Intel returned to the single chip package for both the processor core and L2 cache. Pentium 4 extends the instruction set with SSE2 instructions, and Pentium 4 Prescott with SSE3. NetBurst architecture suffered from high heat emission, causing problems in heat dissipation and cooling.
Opteron is the first processor which supported the 64-bit instruction set architecture, known at the beginning as AMD64 and in general as x86-64. Its versions include the initial one-core and later multi-core processors. AMD Opteron processor implements x86, MMX, SSE and SSE2 instructions known from Intel processors, and also AMD multimedia extension known as 3DNow!.
Pentium D is a multicore 64-bit processor based on the NetBurst architecture known from Pentium 4. Each unit implements two processor cores.
After facing problems with heat dissipation in processors based on the NetBurst microarchitecture, Intel designed the Core microarchitecture, derived from P6. One of the first implementations is Pentium Dual-Core. After some time, Intel changed the name of this processor line back to Pentium to avoid confusion with Core and Core 2 processors. There is a vast range of Core processors models with different sizes of cache, numbers of cores, offering lower or higher performance. From the perspective of this book, we can think of them as modern, advanced and efficient 64-bit processors, implementing all instructions which we consider. There are many internet sources where additional information can be found. One of them is the Intel website[1], and another commonly used is Wikipedia[2].
All Intel Core processors are based on the Core microarchitecture. Intel uses different naming schemas for these processors. Initially, the names represented the number of physical processor cores in one chip; Core Duo has two physical processors, while Core Quad has four.
Improved version of the Core microarchitecture. The naming schema is similar to the Core processors.
Intel changed the naming. Since then, there has been no strict information about the number of cores inside the chip. Name rather represents the overall processor's performance.
The newest (at the time of writing this book, introduced in 2023) naming drops the “i” letter and introduces the “Ultra” versions of high-performance processors. There exist Core 3 - Core 9 processors as well as Core Ultra 3 - Core Ultra 9.
Model | Year | Class | Address bus | Max memory | Clock freq. | Architecture |
---|---|---|---|---|---|---|
8086 | 1978 | 16-bit | 20 bits | 1 MB | 5 - 10 MHz | x86-16 |
80186 | 1982 | 16-bit | 20 bits | 1 MB | 6 - 20 MHz | x86-16 |
80286 | 1982 | 16-bit | 24 bits | 16 MB | 4 - 25 MHz | x86-16 |
80386 | 1985 | 32-bit | 32 bits | 4 GB | 12.5 - 40 MHz | IA-32 |
i486 | 1989 | 32-bit | 32 bits | 4 GB | 16 - 100 MHz | IA-32 |
Pentium | 1993 | 32-bit | 32 bits | 4 GB | 60 - 200 MHz | IA-32 |
Pentium MMX | 1996 | 32-bit | 32 bits | 4 GB | 120 - 233 MHz | IA-32 |
Pentium Pro | 1995 | 32-bit | 36 bits | 64 GB | 150 - 200 MHz | IA-32 |
Pentium II | 1997 | 32-bit | 36 bits | 64 GB | 233 - 450 MHz | IA-32 |
Pentium III | 1999 | 32-bit | 36 bits | 64 GB | 400 MHz - 1.4 GHz | IA-32 |
Pentium 4 | 2000 | 32-bit | 36 bits | 64 GB | 1.3 GHz - 3.8 GHz | IA-32 |
AMD Opteron | 2003 | 64-bit | 40 bits | 1 TB | 1.4 GHz - 3.5 GHz | x86-64 |
Pentium 4 Prescott | 2004 | 64-bit* | 36 bits | 64 GB | 1.3 GHz - 3.8 GHz | IA-32, Intel64* |
Pentium D | 2005 | 64-bit | 2.66 GHz - 3.73 GHz | Intel64 | ||
Pentium Dual-Core | 2007 | 64-bit | 1.3 GHz - 3.4 GHz | Intel64 | ||
Core 2 | 2006 | 64-bit | 36 bits | 64 GB | 1.06 GHz - 3.5 GHz | Intel64 |
* in some models