Differences

This shows you the differences between two versions of the page.

Link to this comparison view

--- en:multiasm:papc:chapter_6_11 [2025/11/11 16:10] – [SSSE3] ktokarz
+++ en:multiasm:papc:chapter_6_11 [2025/11/11 21:39] (current) – [AVX] ktokarz
@@ Line 277: / Line 277: @@
 </table>
-Two data shuffle instructions are worth mentioning. One makes copies of bytes
+Two data shuffle instructions are worth mentioning. The **pshufb** instruction makes copies of bytes from the first 128-bit operand based on the control information taken from the second 128-bit operand. Each byte in the control operand determines the resulting byte in the respective position.
+  * bit 7 is 1 - byte is cleared
+  * bit 7 is 0 - byte contains a copy of the source byte
+  * bits 0-3 - a number of the source byte to be copied
+The illustration is shown in figure {{ref>sse3pshufb}}.
+<figure sse3pshufb>
+{{ :en:multiasm:cs:sse3pshufb.png?650 |Illustration of a byte shuffle instruction}}
+<caption>The illustration of a byte shuffle instruction</caption>
+</figure>
-An interesting description of a variety of x64 AVX instructions is available on website ((https://www.officedaytime.com/simd512e/)).
+The **palignr** instruction combines bytes from two source operands as shown in figure {{ref>sse3palignr}}. The position of the byte split is specified as third immediate. In the figure, the immediate is equal to 2.
+<figure sse3palignr>
+{{ :en:multiasm:cs:sse3palignr.png?650 |Illustration of an aligned byte combine instruction}}
+<caption>The illustration of an aligned byte combine instruction</caption>
+</figure>
+===== SSE4 =====
+The SSE4 is composed of SSE4.1 and SSE4.2. These groups include instructions supplementing previous extensions. For example, there are eight instructions which expand support for packed integer minimum and maximum determination, or twelve instructions which improve packed integer format conversions with sign extension and zero extension.
+The **dpps** and **dppd**  instructions calculate the dot product of four single-precision and two double-precision operands, respectively. Additionally, the arguments are controlled with the third immediate operand. The example showing the **dppd** is presented in figure {{ref>sse4dotproduct}}.
+<figure sse4dotproduct>
+{{ :en:multiasm:cs:sse4dotproduct.png?650 |Illustration of a dot product calculation instruction}}
+<caption>The illustration of a dot product calculation instruction</caption>
+</figure>
+There are also advanced shuffle, insert and extract instructions which make it possible to manipulate positions of the data of various types. A few examples will be shown in the following figures. The **insertps** inserts a scalar single-precision floating-point value with the position of the vector's element in source and destination controlled with an 8-bit immediate. The example showing the **insertps** instruction is presented in figure {{ref>sse4insertps}}. In this example, the immediate contains the bit value of 10011000b.
+<figure sse4insertps>
+{{ :en:multiasm:cs:sse4insertps.png?500 |Illustration of an example of an advanced shuffle instruction}}
+<caption>The illustration of an example of an advanced shuffle instruction</caption>
+</figure>
+In SSE4.2, the set of string compare instructions was added. As the XMM registers can contain sixteen bytes, it is much more efficient to implement string processing algorithms with bigger XMM registers than with registers in the main processor with the use of strong instructions. There are four string compare instructions (see table {{ref>sse4stringtable}}), but each of them can be configured to achieve different functionalities. The length of strings can be explicit or implicit. Explicit length means that the length of the first operand is specified with the RAX register, and the length of the second operand is specified with the RDX register. Implicit length means that both operands contain null-terminated strings. Instructions can produce two kinds of results. Index means that the index of the first or last result is returned. Mask means that the bit mask is returned (one bit for each two elements compared) or a mask of the size of the elements (similarly to MMX compare).
+<table sse4stringtable>
+<caption>SSE4.2 string compare instructions</caption>
+^ Instruction ^ length ^ type of the result ^
+| **pcmpestri** | explicit | index |
+| **pcmpestrm** | explicit | mask |
+| **pcmpistri** | implicit | index |
+| **pcmpistrm** | implicit | mask |
+</table>
+The third, immediate operand encodes the comparison method and result encoding.
+<table SSE4stringdata>
+<caption>SSE4.2 string compare input data</caption>
+^ bits 1:0 ^ data type ^
+| 00 | unsigned BYTE |
+| 01 | unsigned WORD |
+| 10 | signed BYTE |
+| 11 | signed WORD |
+</table>
+<table SSE4stringcomparisonmethod>
+<caption>SSE4.2 string compare method encoding</caption>
+^ bits 3:2 ^ operation ^ comment ^
+| 00 | Equal Any | find any of the specified characters in the input string |
+| 01 | Ranges | check if characters are within the specified ranges |
+| 10 | Equal Each | check if the input strings are equal |
+| 11 | Equal Ordered | check if the needle string is in the haystack string |
+</table>
+The SSE4.2 string compare instructions are advanced, powerful means for processing byte or word strings. The detailed explanation of SSE4.2 string instructions behaviour together with illustrations can be found on ((https://www.officedaytime.com/simd512e/simdimg/str.php?f=pcmpestri)).
+===== AVX =====
+AVX is the abbreviation of Advanced Vector Extensions. The AVX implements larger 256-bit YMM registers as extensions of XMM. In 64-bit processors number of YMM registers is increased to 16. Many SSE instructions are expanded to handle operations with new, bigger data types without modification of mnemonics. The most important improvement in the instruction set of x64 processors is the implementation of RISC-like instructions in which the destination operand can differ from two source operands. A three-operand SIMD instruction format is called the VEX coding scheme. The AVX2 extension implements more SIMD instructions for operation with 256-bit registers. The AVX-512 extends the register size to 512 bits. An interesting, comprehensive description of a variety of x64 AVX instructions is available on website ((https://www.officedaytime.com/simd512e/)).

en/multiasm/papc/chapter_6_11.1762877416.txt.gz · Last modified: 2025/11/11 16:10 by ktokarz