Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
en:multiasm:papc:chapter_6_7 [2025/10/21 12:33] – [Cache manipulating instructions] ktokarzen:multiasm:papc:chapter_6_7 [2025/10/23 12:56] (current) – [BMI1 and BMI2 Instructions] ktokarz
Line 283: Line 283:
  
 The **set//cc//** instruction sets the argument to 1 if the chosen condition is met, or clears the argument if the condition is not met. The condition can be freely chosen from the set of conditions available for other instructions, for example, **cmov//cc//**. This instruction is useful to convert the result of the operation into the Boolean representation. The **set//cc//** instruction sets the argument to 1 if the chosen condition is met, or clears the argument if the condition is not met. The condition can be freely chosen from the set of conditions available for other instructions, for example, **cmov//cc//**. This instruction is useful to convert the result of the operation into the Boolean representation.
 +
 +The **popcnt** instruction counts the number of bits equal to "1" in a data. The applications af this instruction include genome mining, handwriting recognition, digital health workloads, and fast hamming distance counts((https://patents.google.com/patent/US8214414)).
 +
 +The **crc32** instruction implements the calculation of the cyclic redundancy check in hardware. The polynomial of the value 11EDC6F41h is fixed.
 +
 ===== Control transfer instructions ===== ===== Control transfer instructions =====
 Before describing the instructions used for control transfer, we will discuss how the destination address can be calculated. The destination address is the address given to the processor to make a jump to.  Before describing the instructions used for control transfer, we will discuss how the destination address can be calculated. The destination address is the address given to the processor to make a jump to. 
Line 518: Line 523:
 Cache memory is managed by the processor, and usually, its decisions keep the performance of software execution at a good level. However, the processor offers instructions that allow the programmer to send hints to the cache management mechanism and prefetch data in advance of using it (**prefetchw**, **prefetchwt1**) and to synchronise the cache and memory and flush the cache line to make it available for other data (**clflush**, **clflushopt**). There are also additional instructions implemented for cache management introduced together with multimedia and vector extensions. Cache memory is managed by the processor, and usually, its decisions keep the performance of software execution at a good level. However, the processor offers instructions that allow the programmer to send hints to the cache management mechanism and prefetch data in advance of using it (**prefetchw**, **prefetchwt1**) and to synchronise the cache and memory and flush the cache line to make it available for other data (**clflush**, **clflushopt**). There are also additional instructions implemented for cache management introduced together with multimedia and vector extensions.
 ===== User Mode Extended State Save/Restore Instructions ===== ===== User Mode Extended State Save/Restore Instructions =====
-XSAVE Save processor extended states to memory. +Some instructions allow for saving and restoring the state of several units of the processor. They are intended to help processors in fast context switching between processes and to be used instead of saving each register separately at the beginning of a subroutine and restoring it at the endThe content of registers is stored in memory pointed by EDX:EAX registersInstructions for saving the state are **xsave**, **xsavec**, and **xsaveopt**. Instructions for restoring the state are **xrstor** and **xgetbv**.
-XSAVEC Save processor extended states with compaction to memory. +
-XSAVEOPT Save processor extended states to memory, optimized. +
-XRSTOR Restore processor extended states from memory. +
-XGETBV Reads the state of an extended control register.+
  
 ===== Random Number Generator Instructions ===== ===== Random Number Generator Instructions =====
-RDRAND Retrieves a random number generated from hardware. +In the x64 architecture, there are two instructions for generating a random number. These are **rdseed** and **rdrand**. A random number is generated by a specially designed hardware unitThe difference between instructions is that **rdseed** gets random bits generated from entropy gathered from a sensor on the chip. It is slower but offers better randomness of the number. The **rdrand** gets bits from a pseudorandom number generator. It is faster, offering output that is sufficiently secure for most cryptographic applications.
-RDSEED Retrieves a random number generated from hardware.+
  
 ===== BMI1 and BMI2 Instructions ===== ===== BMI1 and BMI2 Instructions =====
-ANDN Bitwise AND of first source with inverted 2nd source operands+The abbreviation BMI comes from Bit Manipulation Instructions. These instructions are designed for some specific manipulation of bits in the arguments, enabling programmers to use a single instruction instead of a few. 
-BEXTR Contiguous bitwise extract. +The **andn** instruction extends the group of logical instructions. It performs a bitwise AND of the first source operand with the inverted second source operand
-BLSI Extract lowest set bit. +There are additional shift and rotate instructions that do not affect flags, which allows for more predictable execution without dependency on flag changes from previous operations.  
-BLSMSK Set all lower bits below first set bit to 1. +. These instructions are **rorx** - rotate right, **sarx** - shift arithmetic right, **shlx** - shift logic left, and **shrx** - shift logic right. 
-BLSR Reset lowest set bit. +Also, unsigned multiplication without affecting flags, **mulx**, was introduced.  
-BZHI Zero high bits starting from specified bit position. +Other instructions manipulate bits as the group name stays. 
-LZCNT Count the number leading zero bits. + 
-MULX Unsigned multiply without affecting arithmetic flags. +The **lzcnt** instruction counts the number of zeros in an argument starting from the most significant bit. The **tzcnt** counts zeros starting from the least significant bit. For an argument that is not zero, **lzcnt** returns the number of zeros before the first 1 from the left, and **tzcnt** gives the number of zeros before the first 1 from the right.  
-PDEP Parallel deposit of bits using a mask. +The **bextr** instruction copies the number of bits from source to destination arguments starting at the chosen position. The third argument specifies the number of bits and the starting bit position. Bits 7:0 of the third operand specify the starting bit position, while bits 15:8 specify the maximum number of bits to extract, as shown in figure {{ref>bextr_instr}}
-PEXT Parallel extraction of bits using a mask. + 
-RORX Rotate right without affecting arithmetic flags+<figure bextr_instr> 
-SARX Shift arithmetic right. +{{ :en:multiasm:cs:bextr.png?400 |Illustration of bit extraction instruction}} 
-SHLX Shift logic left. +<caption>Illustration of bit extraction instruction</caption> 
-SHRX Shift logic right+</figure> 
-TZCNT Count the number trailing zero bits.+ 
 +The **blsi** instruction extracts the single, lowest bit set to one, as shown in figure {{ref>blsi_instr}}
 + 
 +<figure blsi_instr> 
 +{{ :en:multiasm:cs:blsi.png?400 |Illustration of the lowest set bit extraction instruction}} 
 +<caption>Illustration of lowest set bit extraction instruction</caption> 
 +</figure> 
 + 
 +The **blsmsk** instruction sets all lower bits below first bit set to 1. It is shown in figure {{ref>blsmsk_instr}}. 
 + 
 +<figure blsmsk_instr> 
 +{{ :en:multiasm:cs:blsmsk.png?400 |Illustration of the instruction which sets all lower bits below a first bit set to 1.}} 
 +<caption>Illustration of the instruction which sets all lower bits below a first bit set to 1</caption> 
 +</figure> 
 + 
 +The **blsr** instruction resets (clears the bit to zero value) the lowest set bit. It is shown in figure {{ref>blsr_instr}}. 
 + 
 +<figure blsr_instr> 
 +{{ :en:multiasm:cs:blsr.png?400 |Illustration of the instruction which resets a first bit set to 1.}} 
 +<caption>Illustration of the instruction which resets a first bit set to 1</caption> 
 +</figure> 
 + 
 +The **bzhi** instruction resets high bits starting from the specified bit position, as shown in figure {{ref>bzhi_instr}}
 + 
 +<figure bzhi_instr> 
 +{{ :en:multiasm:cs:bzhi.png?400 |Illustration of the instruction which resets high bits starting from the specified bit position.}} 
 +<caption>Illustration of the instruction which resets high bits starting from the specified bit position</caption> 
 +</figure> 
 + 
 +The **pdep** instruction performs a parallel deposit of bits using a mask. Its behaviour is shown in figure {{ref>pdep_instr}}
 + 
 +<figure pdep_instr> 
 +{{ :en:multiasm:cs:pdep.png?600 |Illustration of the parallel deposit instruction}} 
 +<caption>Illustration of the parallel deposit instruction</caption> 
 +</figure> 
 + 
 +The **pext** instruction performs a parallel extraction of bits using a mask. Its behaviour is shown in figure {{ref>pext_instr}}
 + 
 +<figure pext_instr> 
 +{{ :en:multiasm:cs:pext.png?600 |Illustration of the parallel extraction instruction}} 
 +<caption>Illustration of the parallel extraction instruction</caption> 
 +</figure> 
en/multiasm/papc/chapter_6_7.1761049987.txt.gz · Last modified: 2025/10/21 12:33 by ktokarz
CC Attribution-Share Alike 4.0 International
www.chimeric.de Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0