This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| en:multiasm:papc:chapter_6_11 [2025/11/11 18:16] – [SSE4] ktokarz | en:multiasm:papc:chapter_6_11 [2025/11/11 21:39] (current) – [AVX] ktokarz | ||
|---|---|---|---|
| Line 302: | Line 302: | ||
| </ | </ | ||
| - | There are also advanced shuffle, insert and extract instructions which make it possible to manipulate positions of the data of various types. A few examples will be shown in the following figures. The **insertps** inserts a scalar single-precision floating-point value with the position of the vector' | + | There are also advanced shuffle, insert and extract instructions which make it possible to manipulate positions of the data of various types. A few examples will be shown in the following figures. The **insertps** inserts a scalar single-precision floating-point value with the position of the vector' |
| <figure sse4insertps> | <figure sse4insertps> | ||
| - | {{ : | + | {{ : |
| < | < | ||
| </ | </ | ||
| + | In SSE4.2, the set of string compare instructions was added. As the XMM registers can contain sixteen bytes, it is much more efficient to implement string processing algorithms with bigger XMM registers than with registers in the main processor with the use of strong instructions. There are four string compare instructions (see table {{ref> | ||
| + | <table sse4stringtable> | ||
| + | < | ||
| + | ^ Instruction ^ length ^ type of the result ^ | ||
| + | | **pcmpestri** | explicit | index | | ||
| + | | **pcmpestrm** | explicit | mask | | ||
| + | | **pcmpistri** | implicit | index | | ||
| + | | **pcmpistrm** | implicit | mask | | ||
| + | </ | ||
| + | The third, immediate operand encodes the comparison method and result encoding. | ||
| - | An interesting description of a variety of x64 AVX instructions is available on website ((https:// | + | <table SSE4stringdata> |
| + | < | ||
| + | ^ bits 1:0 ^ data type ^ | ||
| + | | 00 | unsigned BYTE | | ||
| + | | 01 | unsigned WORD | | ||
| + | | 10 | signed BYTE | | ||
| + | | 11 | signed WORD | | ||
| + | </ | ||
| + | |||
| + | <table SSE4stringcomparisonmethod> | ||
| + | < | ||
| + | ^ bits 3:2 ^ operation ^ comment ^ | ||
| + | | 00 | Equal Any | find any of the specified characters in the input string | | ||
| + | | 01 | Ranges | check if characters are within the specified ranges | | ||
| + | | 10 | Equal Each | check if the input strings are equal | | ||
| + | | 11 | Equal Ordered | check if the needle string is in the haystack string | | ||
| + | </ | ||
| + | The SSE4.2 string compare instructions are advanced, powerful means for processing byte or word strings. The detailed explanation of SSE4.2 string instructions behaviour together with illustrations can be found on ((https:// | ||
| + | |||
| + | ===== AVX ===== | ||
| + | AVX is the abbreviation of Advanced Vector Extensions. The AVX implements larger 256-bit YMM registers as extensions of XMM. In 64-bit processors number of YMM registers is increased to 16. Many SSE instructions are expanded to handle operations with new, bigger data types without modification of mnemonics. The most important improvement in the instruction set of x64 processors is the implementation of RISC-like instructions in which the destination operand can differ from two source operands. A three-operand SIMD instruction format is called the VEX coding scheme. The AVX2 extension implements more SIMD instructions for operation with 256-bit registers. The AVX-512 extends the register size to 512 bits. An interesting, comprehensive | ||