This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| en:multiasm:papc:chapter_6_11 [2025/11/11 16:10] – [SSSE3] ktokarz | en:multiasm:papc:chapter_6_11 [2025/11/11 21:39] (current) – [AVX] ktokarz | ||
|---|---|---|---|
| Line 277: | Line 277: | ||
| </ | </ | ||
| - | Two data shuffle instructions are worth mentioning. | + | Two data shuffle instructions are worth mentioning. |
| + | * bit 7 is 1 - byte is cleared | ||
| + | * bit 7 is 0 - byte contains a copy of the source byte | ||
| + | * bits 0-3 - a number of the source byte to be copied | ||
| + | The illustration is shown in figure {{ref> | ||
| + | <figure sse3pshufb> | ||
| + | {{ : | ||
| + | < | ||
| + | </ | ||
| - | An interesting description of a variety of x64 AVX instructions is available on website ((https:// | + | The **palignr** instruction combines bytes from two source operands as shown in figure {{ref> |
| + | |||
| + | <figure sse3palignr> | ||
| + | {{ : | ||
| + | < | ||
| + | </ | ||
| + | |||
| + | ===== SSE4 ===== | ||
| + | The SSE4 is composed of SSE4.1 and SSE4.2. These groups include instructions supplementing previous extensions. For example, there are eight instructions which expand support for packed integer minimum and maximum determination, | ||
| + | The **dpps** and **dppd** | ||
| + | <figure sse4dotproduct> | ||
| + | {{ : | ||
| + | < | ||
| + | </ | ||
| + | |||
| + | There are also advanced shuffle, insert and extract instructions which make it possible to manipulate positions of the data of various types. A few examples will be shown in the following figures. The **insertps** inserts a scalar single-precision floating-point value with the position of the vector' | ||
| + | <figure sse4insertps> | ||
| + | {{ : | ||
| + | < | ||
| + | </ | ||
| + | In SSE4.2, the set of string compare instructions was added. As the XMM registers can contain sixteen bytes, it is much more efficient to implement string processing algorithms with bigger XMM registers than with registers in the main processor with the use of strong instructions. There are four string compare instructions (see table {{ref> | ||
| + | <table sse4stringtable> | ||
| + | < | ||
| + | ^ Instruction ^ length ^ type of the result ^ | ||
| + | | **pcmpestri** | explicit | index | | ||
| + | | **pcmpestrm** | explicit | mask | | ||
| + | | **pcmpistri** | implicit | index | | ||
| + | | **pcmpistrm** | implicit | mask | | ||
| + | </ | ||
| + | |||
| + | The third, immediate operand encodes the comparison method and result encoding. | ||
| + | |||
| + | <table SSE4stringdata> | ||
| + | < | ||
| + | ^ bits 1:0 ^ data type ^ | ||
| + | | 00 | unsigned BYTE | | ||
| + | | 01 | unsigned WORD | | ||
| + | | 10 | signed BYTE | | ||
| + | | 11 | signed WORD | | ||
| + | </ | ||
| + | |||
| + | <table SSE4stringcomparisonmethod> | ||
| + | < | ||
| + | ^ bits 3:2 ^ operation ^ comment ^ | ||
| + | | 00 | Equal Any | find any of the specified characters in the input string | | ||
| + | | 01 | Ranges | check if characters are within the specified ranges | | ||
| + | | 10 | Equal Each | check if the input strings are equal | | ||
| + | | 11 | Equal Ordered | check if the needle string is in the haystack string | | ||
| + | </ | ||
| + | The SSE4.2 string compare instructions are advanced, powerful means for processing byte or word strings. The detailed explanation of SSE4.2 string instructions behaviour together with illustrations can be found on ((https:// | ||
| + | |||
| + | ===== AVX ===== | ||
| + | AVX is the abbreviation of Advanced Vector Extensions. The AVX implements larger 256-bit YMM registers as extensions of XMM. In 64-bit processors number of YMM registers is increased to 16. Many SSE instructions are expanded to handle operations with new, bigger data types without modification of mnemonics. The most important improvement in the instruction set of x64 processors is the implementation of RISC-like instructions in which the destination operand can differ from two source operands. A three-operand SIMD instruction format is called the VEX coding scheme. The AVX2 extension implements more SIMD instructions for operation with 256-bit registers. The AVX-512 extends the register size to 512 bits. An interesting, comprehensive | ||