Evolution of x86 Processors

The evolution of x86 processors began with the Intel 8086 microprocessor introduced in 1978. It is worth noticing that the first IBM machines, the IBM PC and IBM XT, used the cheaper version 8088 with an 8-bit data bus. The 8088 was software compatible with 8086, but because of the use of an 8-bit external data bus, it allowed for a reduction in the cost of the whole computer. IBM later used the 8086 processors in personal computers named PS/2. Its successor, 80286, was used in the IBM AT personal computer, the extended version of the XT, and in the IBM PS/2 286 machine, which became the standard architecture for a variety of personal computers designed by many vendors. This started the history of personal computers based on the x86 family of processors. In this section, we will briefly describe the most important features and differences between models of x86 processors.

8086

The 8086 is a 16-bit processor, which means it uses 16-bit registers. With the use of the segmentation operating in so-called real addressing mode and a 20-bit address bus, it can access up to 1MB of memory. The base clocking frequency of this model is 5 - 10 MHz. It implements a three-stage instruction pipeline (loosely pipelined) which allows the execution of up to three instructions at the same time. In parallel to the processor, Intel designed a whole set of supporting integrated circuits, which made it possible to build the computer. One of these chips is 8087 - a match coprocessor known now as the Floating Point Unit.

80186

This model includes additional hardware units and is designed to reduce the number of integrated circuits required to build the computer. Its clock generator operates at 6 - 20 MHz. 80186 implements a few additional instructions. It's considered the faster version of 8086.

80286

The 80286 has an extended 24-bit address bus and theoretically can use 16MB of memory. In this model, Intel tried to introduce protected memory management with a theoretical address space of 1 GB. Due to problems with compatibility and efficiency of execution of the software written for 8086, extended features were used by niche operating systems only, with IBM OS/2 as the most recognised.

80386

It is the first 32-bit processor designed by Intel for personal computers. It implements protected and virtual memory management that was successfully used in operating systems, including Microsoft Windows. Architectural elements of the processor (registers, buses) are extended to 32 bits, which allows addressing up to 4GB of memory. 80386 extends the instruction pipeline with three stages, allowing for the execution of up to 6 instructions simultaneously. Although no internal cache was implemented, this chip enables connecting to external cache memory. Intel uses the IA-32 name for this architecture and instruction set. The original name 80386 was later replaced with i386.

i486

It is an improved version of the i386 processor. Intel combined in one chip the main CPU and FPU (except i486SX version) and memory controller, including 8 or 16 kB of cache memory. The cache is 4-way associative, common for instructions and data. It is a tightly pipelined processor which implements a five-stage instruction pipeline where every stage operates in one clock cycle. The clock frequency is between 16 - 100 MHz.

Tightly pipelined means that all stages of the pipeline perform their duties within the same time period. Loosely pipelined implies that some kind of buffer is used between pipeline stages to decouple the units and allow them to work more independently.
Set-associative is the method of placing memory regions in the cache. It represents a compromise between the complexity of the design of the cache controller and flexibility. In general, more “ways” (e.g. 4-way over 2-way) means more flexibility at the cost of complexity.

Pentium

Pentium is a successor of the i486 processor. It is still a 32-bit processor but implements a dual integer pipeline, which makes it the first superscalar processor in the x86 family. I can operate with the clock frequency ranging from 60 to 300 MHz. An improved microarchitecture also includes a separate cache for instructions and data, and a branch prediction unit, which helps to reduce the influence of pipeline invalidation for conditional jump instructions. The cache memory is 8kB for instructions and 8kB for data, both 2-way associative.

Pentium MMX

Pentium MMX (MultiMedia Extension) is the first processor which implements the SIMD instructions. It uses FPU physical registers for 64-bit MMX vector operations. The clock speed is 120 - 233 MHz. Intel also decided to improve the cache memory as compared to the Pentium. Both instruction and data cache are twice as big (16kB), and they are 4-way associative.

SIMD stands for Single Instruction Multiple Data. Such instructions allow for performing the same operation on more than one data in a single instruction. Intel introduced such instructions in Pentium MMX and continued in further processors as SSE and AVX instructions, naming them vector instructions.

Pentium Pro

Pentium Pro implements a new architecture (P6) with many innovative units, organised in a 14-stage pipeline, which enhances the overall performance. The advanced instruction decoder generates micro-operations, RISC-like translations of the x86 instructions. It can produce up to two micro-operations representing simple x86 instructions and up to six micro-operations from the microcode sequencer, which stores microcodes for complex x86 instructions. Micro-operations are stored in the buffer, called the reorder buffer or instruction pool, and by the reservation station are assigned to a chosen execution unit. In Pentium Pro, there are six execution units. An important feature of P6 architecture is that instructions can be executed in an out-of-order manner with flexibility of the physical register use known as register renaming. All these techniques allow for executing more than one instruction per clock cycle. Additionally, Pentium Pro implements a new, extended paging unit, which allows addressing 64GB of memory. The instruction and data L1 cache have 8kB in size each. The physical chip also includes the L2 cache assembled as a separate silicon die. This made the processor too expensive for the consumer market, so it was mainly implemented in servers and supercomputers.

Pentium II

Pentium II is the processor based on experience gathered by Intel in the development of the previous Pentium Pro processor and MMX extension to Pentium. Pentium II combines P6 architecture with SIMD instructions operating at a maximum of 450 MHz. The L1 cache size is increased to 32 KB (16 KB data + 16 KB instructions). Intel decided to exclude the L2 cache from the processor's enclosure and assemble it as a separate chip on a single PCB board. As a result, Pentium II has a form of PCB module, not an integrated circuit as previous models. Although offering slightly worse performance, this approach made it much cheaper than Pentium Pro, and the implementation of multimedia instructions made it more attractive for the consumer computer market.

Pentium III

Pentium III is very similar to Pentium II. The main enhancement is the addition of the Streaming SIMD Extensions (SSE) instruction set to accelerate SIMD floating point calculations. Due to the enhancement of the production process, it was also possible to increase the clocking frequency to the range of 400 MHz to 1.4 GHz.

Pentium 4

Pentium 4 is the last 32-bit processor developed by Intel. Some late models also implement 64-bit enhancement. It is based on NetBurst architecture, which was developed as an improvement to P6 architecture. The important modification is a movement of the instruction cache from the input to the output of the instruction decoder. As a result, the cache, named trace cache, stores micro-operations instead of instructions. To increase the market impact, Intel decided to enlarge the number of pipeline stages, using the term “hyperpipelining” to describe the strategy of creating a very deep pipeline. A deep pipeline could lead to higher clock speeds, and Intel used it to build the marketing strategy. The Pentium 4's pipeline in the initial model is significantly deeper than that of its predecessors, having 20 stages. The Pentium 4 Prescott processor even has a pipeline of 31 stages. Operating frequency ranges from 1.3 GHz to 3.8 GHz. Intel also implemented the Hyper Threading technology in the Pentium 4 HT version to enable two virtual (logical) cores in one physical processor, which share the workload between them when possible. With Pentium 4, Intel returned to the single chip package for both the processor core and L2 cache. Pentium 4 extends the instruction set with SSE2 instructions, and Pentium 4 Prescott with SSE3. NetBurst architecture suffered from high heat emission, causing problems in heat dissipation and cooling.

AMD Opteron

Opteron is the first processor which supported the 64-bit instruction set architecture, known at the beginning as AMD64 and in general as x86-64. Its versions include the initial one-core and later multi-core processors. AMD Opteron processor implements x86, MMX, SSE and SSE2 instructions known from Intel processors, and also AMD multimedia extension known as 3DNow!.

Pentium D

Pentium D is a multicore 64-bit processor based on the NetBurst architecture known from Pentium 4. Each unit implements two processor cores.

Core Processors

  • Pentium Dual Core.

After facing problems with heat dissipation in processors based on the NetBurst microarchitecture, Intel designed the Core microarchitecture, derived from P6. One of the first implementations is Pentium Dual-Core. After some time, Intel changed the name of this processor line back to Pentium to avoid confusion with Core and Core 2 processors. There is a vast range of Core processors models with different sizes of cache, numbers of cores, offering lower or higher performance. From the perspective of this book, we can think of them as modern, advanced and efficient 64-bit processors, implementing all instructions which we consider. There are many internet sources where additional information can be found. One of them is the Intel website[1], and another commonly used is Wikipedia[2].

  • Core

All Intel Core processors are based on the Core microarchitecture. Intel uses different naming schemas for these processors. Initially, the names represented the number of physical processor cores in one chip; Core Duo has two physical processors, while Core Quad has four.

  • Core 2

Improved version of the Core microarchitecture. The naming schema is similar to the Core processors.

  • Core i3, i5, i7 i9

Intel changed the naming. Since then, there has been no strict information about the number of cores inside the chip. Name rather represents the overall processor's performance.

  • Core X and Core Ultra X

The newest (at the time of writing this book, introduced in 2023) naming drops the “i” letter and introduces the “Ultra” versions of high-performance processors. There exist Core 3 - Core 9 processors as well as Core Ultra 3 - Core Ultra 9.

The most important processors

Model Year Class Address bus Max memory Clock freq. Architecture
8086 1978 16-bit 20 bits 1 MB 5 - 10 MHz x86-16
80186 1982 16-bit 20 bits 1 MB 6 - 20 MHz x86-16
80286 1982 16-bit 24 bits 16 MB 4 - 25 MHz x86-16
80386 1985 32-bit 32 bits 4 GB 12.5 - 40 MHz IA-32
i486 1989 32-bit 32 bits 4 GB 16 - 100 MHz IA-32
Pentium 1993 32-bit 32 bits 4 GB 60 - 200 MHz IA-32
Pentium MMX 1996 32-bit 32 bits 4 GB 120 - 233 MHz IA-32
Pentium Pro 1995 32-bit 36 bits 64 GB 150 - 200 MHz IA-32
Pentium II 1997 32-bit 36 bits 64 GB 233 - 450 MHz IA-32
Pentium III 1999 32-bit 36 bits 64 GB 400 MHz - 1.4 GHz IA-32
Pentium 4 2000 32-bit 36 bits 64 GB 1.3 GHz - 3.8 GHz IA-32
AMD Opteron 2003 64-bit 40 bits 1 TB 1.4 GHz - 3.5 GHz x86-64
Pentium 4 Prescott 2004 64-bit* 36 bits 64 GB 1.3 GHz - 3.8 GHz IA-32, Intel64*
Pentium D 2005 64-bit 2.66 GHz - 3.73 GHz Intel64
Pentium Dual-Core 2007 64-bit 1.3 GHz - 3.4 GHz Intel64
Core 2 2006 64-bit 36 bits 64 GB 1.06 GHz - 3.5 GHz Intel64

* in some models

en/multiasm/papc/chapter_6_1.txt · Last modified: 2025/04/11 05:46 by ktokarz
CC Attribution-Share Alike 4.0 International
www.chimeric.de Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0