Differences

This shows you the differences between two versions of the page.

Link to this comparison view

--- en:multiasm:papc:chapter_6_11 [2025/11/11 18:16] – [SSE4] ktokarz
+++ en:multiasm:papc:chapter_6_11 [2025/11/11 21:39] (current) – [AVX] ktokarz
@@ Line 302: / Line 302: @@
 </figure>
-There are also advanced shuffle, insert and extract instructions which make it possible to manipulate positions of the data of various types. A few examples will be shown in the following figures. The **insertps** inserts a scalar single-precision floating-point value with the position of the vector's element in source and destination controlled with an 8-bit immediate. The example showing the **insertps** instruction is presented in figure {{ref>sse4insertps}}.
+There are also advanced shuffle, insert and extract instructions which make it possible to manipulate positions of the data of various types. A few examples will be shown in the following figures. The **insertps** inserts a scalar single-precision floating-point value with the position of the vector's element in source and destination controlled with an 8-bit immediate. The example showing the **insertps** instruction is presented in figure {{ref>sse4insertps}}. In this example, the immediate contains the bit value of 10011000b.
 <figure sse4insertps>
-{{ :en:multiasm:cs:sse4insertps.png?550 |Illustration of an example of an advanced shuffle instruction}}
+{{ :en:multiasm:cs:sse4insertps.png?500 |Illustration of an example of an advanced shuffle instruction}}
 <caption>The illustration of an example of an advanced shuffle instruction</caption>
 </figure>
+In SSE4.2, the set of string compare instructions was added. As the XMM registers can contain sixteen bytes, it is much more efficient to implement string processing algorithms with bigger XMM registers than with registers in the main processor with the use of strong instructions. There are four string compare instructions (see table {{ref>sse4stringtable}}), but each of them can be configured to achieve different functionalities. The length of strings can be explicit or implicit. Explicit length means that the length of the first operand is specified with the RAX register, and the length of the second operand is specified with the RDX register. Implicit length means that both operands contain null-terminated strings. Instructions can produce two kinds of results. Index means that the index of the first or last result is returned. Mask means that the bit mask is returned (one bit for each two elements compared) or a mask of the size of the elements (similarly to MMX compare).
+<table sse4stringtable>
+<caption>SSE4.2 string compare instructions</caption>
+^ Instruction ^ length ^ type of the result ^
+| **pcmpestri** | explicit | index |
+| **pcmpestrm** | explicit | mask |
+| **pcmpistri** | implicit | index |
+| **pcmpistrm** | implicit | mask |
+</table>
+The third, immediate operand encodes the comparison method and result encoding.
-An interesting description of a variety of x64 AVX instructions is available on website ((https://www.officedaytime.com/simd512e/)).
+<table SSE4stringdata>
+<caption>SSE4.2 string compare input data</caption>
+^ bits 1:0 ^ data type ^
+| 00 | unsigned BYTE |
+| 01 | unsigned WORD |
+| 10 | signed BYTE |
+| 11 | signed WORD |
+</table>
+<table SSE4stringcomparisonmethod>
+<caption>SSE4.2 string compare method encoding</caption>
+^ bits 3:2 ^ operation ^ comment ^
+| 00 | Equal Any | find any of the specified characters in the input string |
+| 01 | Ranges | check if characters are within the specified ranges |
+| 10 | Equal Each | check if the input strings are equal |
+| 11 | Equal Ordered | check if the needle string is in the haystack string |
+</table>
+The SSE4.2 string compare instructions are advanced, powerful means for processing byte or word strings. The detailed explanation of SSE4.2 string instructions behaviour together with illustrations can be found on ((https://www.officedaytime.com/simd512e/simdimg/str.php?f=pcmpestri)).
+===== AVX =====
+AVX is the abbreviation of Advanced Vector Extensions. The AVX implements larger 256-bit YMM registers as extensions of XMM. In 64-bit processors number of YMM registers is increased to 16. Many SSE instructions are expanded to handle operations with new, bigger data types without modification of mnemonics. The most important improvement in the instruction set of x64 processors is the implementation of RISC-like instructions in which the destination operand can differ from two source operands. A three-operand SIMD instruction format is called the VEX coding scheme. The AVX2 extension implements more SIMD instructions for operation with 256-bit registers. The AVX-512 extends the register size to 512 bits. An interesting, comprehensive description of a variety of x64 AVX instructions is available on website ((https://www.officedaytime.com/simd512e/)).

en/multiasm/papc/chapter_6_11.1762885005.txt.gz · Last modified: 2025/11/11 18:16 by ktokarz