The x64 processors can execute an extensive number of different instructions. In the documentation of processors, we can find several ways of dividing all instructions into groups. The most general division, according to AMD, defines five groups of instructions:
Intel defines the following groups of instructions.
There is also a long list of extensions defined, including SSE4.1, SSE4.2, Intel AVX, AMD 3DNow! and many others. For a detailed description of instruction groups, please refer to
Details of every instruction you can find in the description of the instruction set
There are also specialised websites with detailed explanations of instructions that you can use to get a lot of additional information. Among others, you can visit:
In this book, we will present most of the general-purpose instructions and provide general ideas on the chosen extensions, including FPU, MMX, SSE, and AVX.
General-purpose instructions can be divided into some subgroups.
Before describing instructions, let's present the condition codes. The condition code takes the form of a suffix to the instruction and influences its behaviour in such a way that if the condition is met, the instruction is executed; if the condition is not met, the processor moves on to the next instruction in the program. The condition that is checked during the execution of the conditional instruction is based on the current state of the flags in the EFLAGS register. The flags in the EFLAGS register are modified by instructions, mainly arithmetic, logical, shift, or special flag manipulation instructions. It is important to note that flags are not modified when copying data, so to check whether the value just read is zero, you should perform, for example, a comparison. Condition codes together with flags checked are presented in table 1.
| Condition code cc | Flags checked | Comment |
|---|---|---|
| E | ZF = 1 | Equal |
| Z | ZF = 1 | Zero |
| NE | ZF = 0 | Not equal |
| NZ | ZF = 0 | Not zero |
| A | CF=0 and ZF=0 | Above |
| NBE | CF=0 and ZF=0 | Not below or equal |
| AE | CF=0 | Above or equal |
| NB | CF=0 | Not below |
| B | CF=1 | Below |
| NAE | CF=1 | Not above or equal |
| BE | CF=1 or ZF=1 | Below or equal |
| NA | CF=1 or ZF=1 | Not above |
| G | ZF=0 and SF=OF | Greater |
| NLE | ZF=0 and SF=OF | Not less or equal |
| GE | SF=OF | Greater or equal |
| NL | SF=OF | Not less |
| L | SF<>OF | Less |
| NGE | SF<>OF | Not greater or equal |
| LE | ZF=1 or SF<>OF | Less or equal |
| NG | ZF=1 or SF<>OF | Not greater |
| C | CF=1 | Carry |
| NC | CF=0 | Not carry |
| O | OF=1 | Overflow |
| NO | OF=0 | Not ovrflow |
| S | SF=1 | Sign (negative) |
| NS | SF=0 | Not sign (non-negative) |
| P | PF=1 | Parity |
| PE | PF=1 | Parity even |
| NP | PF=0 | Not parity |
| PO | PF=0 | Parity odd |
Almost all assembler tutorials start with the presentation of the mov instruction, which is used to copy data from the source operand to the destination operand. Our book is not an exception, and we've already shown this instruction in examples presented in previous sections.
Let's look at some additional variants.
mov al, bl ;copy one byte from bl to al mov ax, bx ;copy word (two bytes) from bx to ax mov eax, ebx ;copy doublweword (four bytes) from ebx to eax mov rax, rbx ;copy quadword (eight bytes) from rbx to eax
In the mov instruction, the size of the source argument must be the same as the size of the destination argument. Arguments can be stored in registers, in memory addressed directly or indirectly. One of them can be constant (immediate). Only one memory argument is allowed. This comes from instructions encoding. In instructions, there is only one possible direct or indirect argument to be encoded. That's why most instructions, not only mov, can operate with one memory argument only. There are some exceptions, for example, string instructions, but such instructions use specific indirect addressing.
mov al, 100 ;0xB0, 0x64 copy constant (immediate) of the value 100 (0x64) to al mov al, [bx] ;0x67, 0x8A, 0x07 copy byte from the memory at address stored in bx to al (indirect addressing) ;Notice the difference between two following instructions mov eax, 100 ;0xB8, 0x64, 0x00, 0x00, 0x00 copy constant 100 to eax mov eax, [100] ;0xA1, 0x64, 0x00, 0x00, 0x00 copy value from memory at address 100 ;It is possible to copy a constant to memory addressed directly or indirectly ;operand size specifier dword ptr is required to inform the processor about the size of the argument mov dword ptr ds:[200], 100 ;0xC7, 0x05, 0xC8, 0x00, 0x00, 0x00, 0x64, 0x00, 0x00, 0x00 ;copy value of 100, encoded as dword (four bytes), 0x64 = 100 ;to memory at address 200, encoded as four bytes, 0xC8 = 200 mov dword ptr [ebx], 100 ;0xC7, 0x03, 0x64, 0x00, 0x00, 0x00 ;copy value of 100, encoded as dword (four bytes), 0x64 = 100 ;to memory addressed by ebx
Starting from the P6 machines, the conditional move instruction cmovcc was introduced. This works similarly to mov, but copies data if the specified condition is true. The condition code is one of the codes presented in the section “Condition Codes”. If the condition is false, the instruction simply passes through without modifying the arguments. Conditional move instructions can be used to avoid conditional jumps. For example, if we need to copy data from ebx to ecx, if the result of the previous operation is negative, we can write the following instruction.
cmovs ecx, ebx
In the situation of copying data of a smaller size (expressed in number of bits) to a bigger destination argument, the question arises as to what to do with the remaining bits. Let us consider copying an 8-bit value from bl to the 16-bit ax register. If the value copied is unsigned or positive (let it be 5), the remaining bits should be cleared.
; ah al mov al, bl ; 00000101 = 5 in al mov ah, 0 ;00000000 ;0000000000000101 = 5 in ax
If the value is negative (e.g. -5) the situation changes.
; ah al mov al, bl ; 11111011 = -5 in al mov ah, 0 ;00000000 ;0000000011111011 = 251 in ax
It is visible that to preserve the original value, the upper bits must be filled with ones, not zeros.
; ah al mov al, bl ; 11111011 = -5 in al mov ah, 0xFF ;11111111 ;1111111111111011 = -5 in ax
There are special instructions which perform automatic sign extension, copying the sign bit to all higher bit positions. They can be considered as type conversion instructions. These instructions do not have any arguments as they operate on the accumulator only.
Sign extension instructions work solely with the accumulator. Fortunately, there are also more universal instructions which copy and extex data at the same time.
The exchange instructions swap the values of operands. A single exchange instruction can replace three mov instructions while swapping the contents of two arguments, so they can be useful in optimising some algorithms. They are helpful in the implementation of semaphores, even in multiprocessor systems. The xchg instruction swaps the values of two arguments. If one of the arguments is in memory, the instruction behaves as with the LOCK prefix, allowing for semaphore implementation. The cmpxchg has three arguments: source, destination and accumulator. It compares the destination argument with the accumulator; if they are equal, the destination argument value is replaced with the value from the source operand. It is used to test and modify semaphores. Its operation is presented in fig 1. In newer machines, the eight- and sixteen-byte versions were added: cmpxchg8b and cmpxch16b. They always use ECX:EBX or RCX:RBX as the source argument and EDX:EAX or RDX:RAX as the accumulator. The destination argument is in the memory.
The xadd instruction exchanges two arguments, adds them, and stores the sum in a destination argument. Together with a LOCK prefix, it can be used to implement a DO loop executed by more than one processor simultaneously.
The bswap instruction is a single-argument instruction; it changes the order of bytes in a 32- or 64-bit register. It can be used to convert little-endian data to big-endian representation and vice versa, as shown in figure 2.
A stack is a special structure in the memory that automatically stores the return address (address of the next instruction) while procedure calling (it is described in detail in the section about the call instruction). It is also possible to use the stack for local variables in functions, to pass arguments to procedures, and for temporal data storage. In x86 architecture, the stack is supported by hardware with the special stack pointer register. Instructions operating on the stack automatically modify the stack pointer in a way that it always points to the top of the stack. The push instruction decrements the stack pointer and places the data onto the stack. As a result, the stack pointer points to the last data on the stack. It is shown in figure 3.
The pop instruction takes data off the stack, copies it into the destination argument, and increments the stack pointer. After its execution, the stack pointer points to the previous data stored on the stack. It is shown in figure 4.
There are also instructions that push or pop all eight general-purpose registers (including the stack pointer). The 16-bit registers are pushed with pusha and popped with popa instructions. For 32-bit registers, the pushad and popad instructions can be used, respectively. The order of registers on the stack is shown in figure 5. These instructions are not supported in 64-bit mode.
Arithmetic instructions perform calculations on binary encoded data. It is worth noting that the processor does not distinguish between unsigned and signed values; it is the responsibility of the programming engineer to provide correct input values and properly interpret the results obtained.
There are two adding instructions. The add adds two values from the destination and source arguments and stores the result in the destination argument. It modifies the flags in the EFLAG register according to the result. The adc instruction additionally adds “1” if the carry flag (CF) is set. It allows the processor to calculate the sum of the values bigger than can be encoded in a register (for example, 128-bit integers in a 64-bit processor). Similarly, there are two subtraction instructions. The sub subtracts the source argument from the destination argument, stores the result in the destination, and modifies the flags according to the result. The sbb instruction calculates the difference of arguments minus “1” if the CF flag is set (here, CF plays the role of the borrow flag).
The inc instruction adds “1” to, and dec instruction subtracts “1” from the argument. The argument is treated as an unsigned integer.
Two multiply instructions are implemented. The mul is a one-argument instruction. It multiplies the content of the argument and the accumulator, treated as unsigned numbers. The size of the accumulator corresponds to the size of the argument. The result is stored in the accumulator. As the multiplication can give the result even twice as big as the input values, it is stored in a bigger accumulator size, as shown in the table 2.
| Argument | Accumulator | Result |
|---|---|---|
| 8 bits | AL | AX |
| 16 bits | AX | DX:AX |
| 32 bits | EAX | EDX:EAX |
| 64 bits | RAX | RDX:RAX |
The imul instruction implements the signed multiply. It can have one, two or three arguments. The single-argument version behaves the same way as the mul instruction. The two-argument version multiplies the 16-, 32-, or 64-bit register as the destination operand by the argument of the same size. The three-argument version multiplies the content of the source argument by the immediate and stores the result in the destination of the same size as the arguments. The destination must be the register.
Two divide instructions are implemented. The div is a one-argument instruction. It divides the content of the accumulator by the argument, treated as unsigned numbers. The size of the accumulator is twice as big as the size of the argument. The result is stored as two integer values of the same size as the argument. The quotient is placed in the lower half of the accumulator, and the remainder in the higher half of the accumulator. Depending on the size of the argument, the accumulator is understood as a pair of registers DX:AX, EDX:EAX or RDX:RAX, as shown in the table 3.
| Argument | Accumulator | Quotient | Remainder |
|---|---|---|---|
| 8 bits | AX | AL | AH |
| 16 bits | DX:AX | AX | DX |
| 32 bits | EDX:EAX | EAX | EDX |
| 64 bits | RDX:RAX | RAX | RDX |
The idiv instruction implements the signed divide. It behaves the same way as the div instruction except for the type of numbers.
The set of logical instructions contains and, or, xor and not instructions. All of them perform bitwise Boolean operations corresponding to their names. The not is a single-argument instruction; others have two arguments.
Shift and rotate instructions treat the argument as the shift register. Each bit of the argument is moved to the neighbour position on the left or right, depending on the shift direction. The number of bit positions for the shift can be specified as a constant or in the CX register. Shift instructions can be used for multiplying (shift left) and dividing (shift right) by a power of two. Shift instructions have two versions: logical and arithmetical. Logical shift left shl and arithmetical shift left sal behave the same, filling the empty bits (at the LSB position) with zeros. Logical shift right shr fills the empty bits (at the MSB position) with zeros, while the arithmetical shift right sar makes a copy of the most significant bit, preserving the sign of a value. It is shown in figure 6.
There are two double shift instructions which move bits from the source argument to the destination argument. The number of bits is specified as the third argument. Shift double right has shrd mnemonic, while shift double left has shld mnemonic. The operation of shift double instructions is presented in figure 7.
For all shift instructions, the last bit shifted out is placed in the carry flag.
Rotate instructions shift bits left rol or right ror in the argument, and additionally move bits around from the lowest to the highest or from the highest to the lowest position. Behaviour of rotate instructions is shown in figure 8.
Rotate through carry left rcl and right rcr, treat the carry flag as the additional bit while rotating. They can be used to collect bits to form multi-bit data. Behaviour of rotate with carry instructions is shown in figure 9.
Bit test instruction bt makes a copy of the selected bit in the carry flag. The bit for testing is specified by a combination of two arguments. The first argument, named the bit base operand, holds the bit. It can be a register or a memory location. The second operand is the bit offset, which specifies the position of the bit operand. It can be a register or an immediate value. It starts counting from 0, so the least significant bit has the position 0. An example of the behaviour of the bt instruction is shown in figure 10.
Bit test and modify instructions first make a copy of the selected bit, and next modify the original bit value with the one specified by the instruction. The bts sets the bit to one, btr clears the bit (resets to zero value), btc changes the state of the bit to the opposite (complements).
The bit scan instructions search for the first occurrence of the bit of the value 1. The bit scan forward bsf scans starting from the least significant bit towards higher bits, bit scan reverse bsr starts from the most significant bit towards lower bits. Both instructions return the index of the found bit in the destination register. If there is no bit of the value 1, the zero flag is set, and the destination register value is undefined.
The test instruction performs the logical AND function without storing the result. It just modifies flags according to the result of the AND operation.
The setcc instruction sets the argument to 1 if the chosen condition is met, or clears the argument if the condition is not met. The condition can be freely chosen from the set of conditions available for other instructions, for example, cmovcc. This instruction is useful to convert the result of the operation into the Boolean representation.
The popcnt instruction counts the number of bits equal to “1” in a data. The applications af this instruction include genome mining, handwriting recognition, digital health workloads, and fast hamming distance counts[7].
The crc32 instruction implements the calculation of the cyclic redundancy check in hardware. The polynomial of the value 11EDC6F41h is fixed.
Before describing the instructions used for control transfer, we will discuss how the destination address can be calculated. The destination address is the address given to the processor to make a jump to.
While the segmentation is enabled, the destination address can be given as the offset only or in full logical form. If there is an offset only, the instruction modifies solely the instruction pointer, the jump is performed within the current segment and is called near. If the address is provided in full logical form, containing segment and offset parts, the CS and IP registers are modified. Such an instruction can perform a jump between segments and is called far.
An absolute address is given as a value specifying the destination address as the number of the byte counted from the beginning of the memory, or, if segmentation is enabled, as the offset from the beginning of the segment. A relative address is calculated as the difference between the current value of the instruction pointer and the absolute destination address. It is provided in the instructions as the signed number representing the distance between the current and destination addresses. If it is possible to encode the difference as an 8-bit signed value, the jump is called short. Usually, an assembler automatically chooses the shortest possible encoding.
Conditional transfer instructions check the state of chosen flags in the Flags register and perform the jump to the specified address if the condition gives a true result. If the condition results in false, the processor goes to the next instruction in the instruction stream. Conditions are specified the same way as in cmovcc instruction as the suffix to the main mnemonic. Unconditional transfer instructions are always executed the same way. They jump to the specified address without any condition checking.
Unconditional control transfer instructions perform the jump to the new address to change the program flow. The jmp instruction jumps to a destination address by putting the destination address in the instruction pointer register. If segmentation is enabled and the destination address is placed in another segment than the current one, it also modifies the CS register. The call instruction is designed to handle subroutines. It also jumps to a destination address, but before putting the new value into the instruction pointer, it pushes the returning address onto the stack. The returning address is the address of the next instruction after the call. This allows the processor to use the returning address later to get back from the subroutine to the main program. The ret instruction forms a pair with the call. It uses the information stored on the stack to return from a subroutine. The process of calling a procedure and returning to the main program is shown in figure 11.
An interrupt mechanism in x86 works with hardware-signalled interrupts or with special interrupt instructions. Return from an interrupt is performed by executing the iret instruction. In 32 and 64-bit architectures, the mnemonic for this instruction is iretd. The iret instruction differs from the ret instruction with popping of the stack not only the return address but also the content of the Flags register. This keeps the content of this register unmodified after return, and additionally prevents unintentional blocking following interrupts. The process of interrupt handler calling and returning to the main program is shown in figure 12.
Software interrupts are handled the same way as signalled by the hardware. The int instruction signals the interrupt of a given number. There are also some special interrupt instructions. The int1 and int3 are one-byte special machine codes used for debugging, into signals a software overflow exception if the OF flag is set, and bound raises the bound range exceeded exception (int 5) when the tested value is over or under the defined bounds. The last two instructions are not valid in 64-bit mode.
The jcc instructions are used to test the state of flags and perform the jump to the destination address if the condition is met. In modern pipelined processors, it is recommended to avoid using conditional jumps if possible, ensuring that the program flows continuously, without the need to invalidate the pipeline. It is important to remember that flags are modified as a result of executing the arithmetic or logic instruction, but not the mov instruction. For example, if we need to test if some variable is zero, we can write such code:
cmp var1, 0 ;compare variable jz is_zero ;conditional jump to address is_zero mov rax, "1" ;if not zero put ASCII code of "1" in rax jmp not_zero ;jump unconditionally over next instruction is_zero: ;label to jump to if var1 is zero mov rax, "0" ;if zero put ASCII code of "0" in rax not_zero: ;label to jump to if var1 is not zero
The loop instruction is used to implement a loop, which is executed a known number of times. The number of iterations should be set before a loop in the counter register (CX/ECX/RCX). The loop instruction automatically decrements the counter register, checks if it reaches zero and if not jumps to the address, which is the argument of the instruction and is assumed as the beginning address of a loop. If the counter reaches zero, the loop instruction goes further to the next instruction in a stream. There are also conditional versions of the loop instruction, which allow finishing the iteration process before the counter reaches zero. The loope or loopz instructions continue the iteration if the counter is above zero and the zero flag (ZF) is set. The loopne or loopnz continue iteration if the counter is above zero and the zero flag (ZF) is cleared. The loop instruction can cause the system to iterate many times if the counter register is zero before entering the loop. As the first step is the decrementing of the counter, it will result in a value composed of all “1”. For CX, the loop will be executed 65536 times, for ECX more than 4 billion times and for RCX 184 quintillion 466 quadrillion 744 trillion 73 billion 709 million 551 thousand and 616 times! Understandably, we should avoid such a situation. The jcxz, jecxz and jrcxz instructions can help to jump over the entire loop if the counter register is zero at the beginning, as in the following code.
lea rbx, table ;table with values to sum mov rcx, size ;size of a table - we can't ensure it's not zero xor rdx, rdx ;zero rdx - it will be the sum af elements jrcxz end_loop ;jump over the loop if rcx is zero begin_loop: add rdx, [rbx] ;add the item to the resulting value inc rbx ;point to another item in a table loop begin_loop ;loop end_loop:
String instructions are developed to perform operations on elements of data tables, including text strings. These instructions can access two elements in memory - source and destination. If segmentation is enabled, the source operand is identified with SI/ESI and placed always in the data segment (DS), the destination operand is identified with DI/EDI and stored in the extended data segment (ES). In 64-bit mode, the source operand is identified with RSI, and the destination operand is identified with RDI. They can operate on bytes, words, doublewords or quadwords. The size of the element is specified as the suffix of the instruction or derived from the size of the arguments specified in the instruction.
The movs instruction copies the element of the source string to the destination string. It requires two arguments of the size of bytes, words, doublewords or quadwords. The movsb instruction copies a byte from the source string to the destination string. The movsw instruction copies a word from the source string to the destination string. The movsd instruction copies a doubleword from the source string to the destination string. The movsq instruction copies a quadword from the source string to the destination string.
These instructions store the content of the accumulator to the destination operand. The stos instruction copies the content of the accumulator to the destination string. It requires one argument of the size of byte, word, doubleword or quadword. The stosb instruction copies a byte from the AL to the destination string. The stosw instruction copies a word from the AX to the destination string. The stosd instruction copies a doubleword from the EAX to the destination string. The stosq instruction copies a quadword from the RAX to the destination string.
These instructions load the content of the source string to the accumulator. The lods instruction copies the content of the source string to the accumulator. It requires one argument of the size of byte, word, doubleword or quadword. The lodsb instruction copies a byte from the source string to the AL. The lodsw instruction copies a word from the source string to the AX. The lodsd instruction copies a doubleword from the source string to the EAX. The lodsq instruction copies a quadword from the source string to the RAX.
Strings can be compared, which means that the element of the destination string is compared with the element of the source string. These instructions set the status flags in the flags register according to the result of the comparison. The elements of both strings remain unchanged. The cmps instruction compares the element of a source string with the element of the destination string. It requires one argument, which specifies the size of the accumulator and the data element. The cmpsb instruction compares a byte from the source string with a byte from the destination string. The cmpsw instruction compares a word from the source string with a word from the destination string. The cmpsd instruction compares a doubleword from the source string with a doubleword from the destination string. The cmpsq instruction compares a quadword from the source string with a quadword from the destination string.
Strings can be scanned, which means that the element of the destination string is compared with the accumulator. These instructions set the status flags in the flags register according to the result of the comparison. The accumulator and string element remain unchanged. The scas instruction compares the accumulator with the element of the destination string. It requires one argument, which specifies the size of the accumulator and the data element. The scasb instruction compares the AL with a byte from the destination string. The scasw instruction compares the AX with a word from the destination string. The scasd instruction compares the EAX with a doubleword from the destination string. The scasq instruction compares the RAX with a quadword from the destination string.
All string instructions can be preceded by the repetition prefix to automate the processing of multiple-element tables. Use of the prefix enables the instructions to automatically repeat the instruction execution according to the content of the counter register and modify the source and destination addresses in index registers, accordingly to the size of the element. Index registers can be incremented or decremented depending on the direction flag (DF) state. If DF is “0”, the addresses are incremented; if DF is “1” addresses are decremented. While the string element's size is a byte, the addresses are modified by 1. For words, the addresses are modified by 2, for doublewords by 4, and for quadwords by 8. The rep prefix allows block copying, storing and loading of an entire string rather than a single element. The use of repeated string instructions enables copying the entire string from one place in memory to another, or filling up the memory regions with a pattern.
The repe or repz prefixes additionally test if the zero flag is “1”, to finish prematurely the process of string scan or comparison. The repne or repnz prefixes test if the zero flag is “0” to stop the iteration throughout the string. The conditional prefixes are intended to be used with scas or cmps instructions. The use of repeated string instructions with conditional prefixes enables string comparison for equality or differences, or to find the element in a string.
To properly use the repeated string instructions, follow these steps:
These instructions allow the processor to transfer data between the accumulator register and a peripheral device. A peripheral device can be addressed directly or indirectly. Direct addressing uses an 8-bit constant as the peripheral address (named in x86 I/O port), and it accesses only the first 256 port addresses. Indirect addressing uses the DX register as the address register, enabling access to the entire I/O address space of 65536 addresses. The in instruction reads data from a port to the accumulator. The out instruction writes the data from the accumulator to the port. The size of the accumulator determines the size of the data to be transferred. It can be AL, AX or EAX. The I/O instructions also have string versions. Instructions to read the port to a string are ins, insb, insw, and insd. Instructions to write a string to a port are outs, outsb, outsw, and outsd. In all string I/O instructions, the port is addressed with the DX register. Rules for addressing the memory are the same as in string instructions.
Enter instruction creates the stack frame for the function. The stack frame is a place on the stack reserved for the function to store arguments and local variables. Traditionally, we access the stack frame with the use of the RBP register, but we need to preserve its content before use. The enter instruction can be nested or non-nested. Not-nested saves the RBP on the stack, copies the stack pointer value to RBP, and adjusts the stack pointer with the constant value, which is the first operand of the instruction. After these steps, the RSP points to the top of the stack frame, and the RBP points to the stack base. The nested version creates the path to the higher-level functions' stack frames by adding their momentary value of RBP. The leave instruction reverses what enter did at the end of the function. The enter should be placed at the very beginning of the function, while the leave just before ret.
Flag control instructions are typically used to set or clear the chosen flag in the RFLAGS register. We can only control three flags directly. The carry (CF) flag can be used in conjunction with the rotate-with-carry instructions to convert the series of bits into a binary-encoded value. The direction (DF) flag determines the direction of modification of index registers RSI and RDI when executing string instructions. If the DF flag is clear, the index registers are incremented; if the DF flag is set, the registers are decremented after each iteration of a string instruction. The interrupt (IF) flag enables or disables hardware interrupts. If the IF flag is set, the hardware interrupts are enabled; if the IF flag is clear, hardware interrupts are masked. The summary of instructions is shown in the table 4.
| Instruction | Behavoiur | flag affected |
|---|---|---|
| stc | set carry flag | CF=1 |
| clc | clear carry flag | CF=0 |
| cmc | complement carry flag | CF=not CF |
| std | set direction flag | DF=1 |
| cld | clear direction flag | DF=0 |
| sti | set interrupt flag | IF=1 |
| cli | clear interrupt flag | IF=0 |
The flags register can be pushed onto the stack and popped afterwards. This can be done inside the procedure, but also to test or manipulate bits in the flags register, for which modifications are not supported by a special instruction. The pushf pushes the FLAGS register, the pushfd pushes the EFLAGS register, and the pushfq pushes the RFLAGS register onto the stack. The popf pops the FLAGS register, the popfd pops the EFLAGS register, and the popfq pops the RFLAGS register from the stack. There is also a possibility to copy SF, ZF, AF, PF, and CF to the AH register with the lahf instruction, and store these flags back from AH with the use of the sahf instruction.
Segment register instructions are used to load a far pointer to a pair of registers. One of the pair is the segment, which is determined by the instruction; another is the offset and appears as the destination argument. The source argument is the far pointer stored in the memory. These instructions include lds – load far pointer using DS, les – load far pointer using ES, lfs – load far pointer using FS, lgs – load far pointer using GS, and lss – load far pointer using SS. The following example shows loading far pointer in 16-bit mode.
; Load far pointer to DS:BX ; Variable Far_point holds the 32-bit address lds BX,Far_point ; Instruction above is equal to: mov AX,WORD PTR Far_point+2 ; Take higher word of far pointer mov DS,AX ; Store it in DS mov BX,WORD PTR Far_point ; Store lower word of far pointer in BX
In 64-bit mode, lds and les instructions are not supported.
The nop instruction performs no operation. The only result is incrementaion of the instruction pointer. In real, it is an alias to the instruction xchg eax, eax.
nop ;encoded as 0x90 xchg eax, eax ;encoded as 0x90
The lea instruction calculates the effective address as the result of the proper address expression and stores the result in a destination operand. We can store the effective address in a single register to avoid complex address calculation inside a loop, like in the following example.
; Load effective address to BX ; Table is the beginning of the table in the memory lea BX,Table[SI] ; Now we can use BX only to make the program run faster: hoop: mov AX,[BX] ; Take value from table inc BX ; Next element in the table cmp AX,0 ; Check if element is 0 jne hoop ; Jump to „hoop” if AX isn’t 0
The undefined instructions can be used to test the behaviour of the system software in case of the appearance of an unknown opcode in the instruction stream. The ud and ud1 instructions can have a source operand (register or memory address) and a destination operand (register). Operands are not used. The ud2 instruction does not have an operand. Executing any undefined instruction results in an invalid opcode exception (#UD) throw.
The xlatb instruction copies the byte from a table into the AL register. The byte is addressed as the sum of the BX/EX/RBX and AL registers. There is also an xlat version, which enables specifying the address in the memory as the argument. It can be somewhat misleading because the argument is never used by the processor. This instruction can be used to implement the conversion from a 4-digit binary value into a hexadecimal digit, as in the following code.
.DATA conv_table DB ”0123456789ABCDEF” .CODE ; Load base address of table to BX lea RBX, conv_table and AL, 0Fh ; Limit AL to 4 bits xlatb ; Take element from the table mov char, AL ; Resulting char is in AL
The cpuid instruction provides processor identification information. It operates similarly to the function, with the input value sent via an accumulator (EAX). Depending on the EAX value gives different information about the processor. The requested information is returned in processor registers. For example, if EAX is zero, it returns the vendor information string: “GenuineIntel” for Intel processors, “AuthenticAMD” for AMD models in ECX, EDX and EBX registers. It is shown in figure 13.
The movbe instruction moves data after swapping data bytes. It operates on words, doublewords or quadwords and is usually used to change the endianness of the data.
Cache memory is managed by the processor, and usually, its decisions keep the performance of software execution at a good level. However, the processor offers instructions that allow the programmer to send hints to the cache management mechanism and prefetch data in advance of using it (prefetchw, prefetchwt1) and to synchronise the cache and memory and flush the cache line to make it available for other data (clflush, clflushopt). There are also additional instructions implemented for cache management introduced together with multimedia and vector extensions.
Some instructions allow for saving and restoring the state of several units of the processor. They are intended to help processors in fast context switching between processes and to be used instead of saving each register separately at the beginning of a subroutine and restoring it at the end. The content of registers is stored in memory pointed by EDX:EAX registers. Instructions for saving the state are xsave, xsavec, and xsaveopt. Instructions for restoring the state are xrstor and xgetbv.
In the x64 architecture, there are two instructions for generating a random number. These are rdseed and rdrand. A random number is generated by a specially designed hardware unit. The difference between instructions is that rdseed gets random bits generated from entropy gathered from a sensor on the chip. It is slower but offers better randomness of the number. The rdrand gets bits from a pseudorandom number generator. It is faster, offering output that is sufficiently secure for most cryptographic applications.
The abbreviation BMI comes from Bit Manipulation Instructions. These instructions are designed for some specific manipulation of bits in the arguments, enabling programmers to use a single instruction instead of a few. The andn instruction extends the group of logical instructions. It performs a bitwise AND of the first source operand with the inverted second source operand. There are additional shift and rotate instructions that do not affect flags, which allows for more predictable execution without dependency on flag changes from previous operations. . These instructions are rorx - rotate right, sarx - shift arithmetic right, shlx - shift logic left, and shrx - shift logic right. Also, unsigned multiplication without affecting flags, mulx, was introduced. Other instructions manipulate bits as the group name stays.
The lzcnt instruction counts the number of zeros in an argument starting from the most significant bit. The tzcnt counts zeros starting from the least significant bit. For an argument that is not zero, lzcnt returns the number of zeros before the first 1 from the left, and tzcnt gives the number of zeros before the first 1 from the right. The bextr instruction copies the number of bits from source to destination arguments starting at the chosen position. The third argument specifies the number of bits and the starting bit position. Bits 7:0 of the third operand specify the starting bit position, while bits 15:8 specify the maximum number of bits to extract, as shown in figure 14.
The blsi instruction extracts the single, lowest bit set to one, as shown in figure 15.
The blsmsk instruction sets all lower bits below a first bit set to 1. It is shown in figure 16.
The blsr instruction resets (clears the bit to zero value) the lowest set bit. It is shown in figure 17.
The bzhi instruction resets high bits starting from the specified bit position, as shown in figure 18.
The pdep instruction performs a parallel deposit of bits using a mask. Its behaviour is shown in figure 19.
The pext instruction performs a parallel extraction of bits using a mask. Its behaviour is shown in figure 20.