8/14/2019 Single Instruction Multiple Data (SIMD) and MMX Registers
1/14
CS220
April 23, 2007
8/14/2019 Single Instruction Multiple Data (SIMD) and MMX Registers
2/14
Some tips to lab 7#includemain()
{float f1=1.1,f2=2.2;float result;
__asm__ (
"flds %1\n\t""fadds %2\n\t""fsts %0": "=m"(result): "m"(f1), "m"(f2)
);printf("f1 + f2 = %f\n",result);
}
#includemain(){float f1=1.1,f2=2.2;float result;
__asm__ (
"faddp\n\t": "=t"(result): "0"(f1), "u"(f2): "st(1)"
);printf("f1 + f2 = %f\n",result);
}
8/14/2019 Single Instruction Multiple Data (SIMD) and MMX Registers
3/14
Single Instruction Multiple Data
(SIMD)
Data level parallelism
Multimedia Extensions (MMX)
Integers
Reuse FP registers Streaming SIMD Extensions (SSE)
expanded with 32-bit floating point support
Additional registers
8/14/2019 Single Instruction Multiple Data (SIMD) and MMX Registers
4/14
SSE/SSE2/SSE3 Perform SIMD operations on floating-point data.
128-bit, packed, single-precision floating-point data type
contain four single-precision floating-point values Eight 128-bit registers (XMM0 through XMM7)
SSE2 128-bit packed double-precision floating-point value
contains two double-precision values
128-bit packed byte integer value
contains 16 single-byte integer values 128-bit packed word integer value
contains eight word integer values
128-bit packed double word integer value contains four double word integer values
128-bit packed quad word integer value contains two quad word integer values
SSE3 No new data type
8/14/2019 Single Instruction Multiple Data (SIMD) and MMX Registers
5/14
MMX Registers MMX utilizes the 80-bit
FPU registers MM0 through MM7 are
directly mapped to FPU
registers R0 through R7 Random access contrast
to register stack in FPU
Only use 64 bits, upper 16bits are set to all ones(NaNs or infinities in FPview)
8/14/2019 Single Instruction Multiple Data (SIMD) and MMX Registers
6/14
Two New principles1. Operations on packed data
four new 64-bit data types: Packed byte
Eight bytes packed into one 64-bit quantity
Packed word Four words packed into one 64-bit quantity
Packed doubleword Two doublewords packed into one 64-bit quantity
Quadword One 64-bit quantity
2. Saturation Arithmetic
8/14/2019 Single Instruction Multiple Data (SIMD) and MMX Registers
7/14
MMX Data Types
Note that the values in one same register can have different interpretations
8/14/2019 Single Instruction Multiple Data (SIMD) and MMX Registers
8/14
Saturation and Wraparound Wraparound: truncating any overflow, only the lower bits are
returned. The carry is ignored.
add two eight-bit values 0x02 and 0xFF
The actual sum is 0x101, but the ninth bit is truncated, and theresult is 0x01
Saturation: Results are clipped (saturated) to some maximum orminimum value, 8-bit example:
0xFFFF0x0655350Unsigned Word
0x7FFF0x8000+32767-32768Signed Word
0xfF0x02550Unsigned Byte
0x7F0x80+127-128Signed Byte
Upper LimitLower LimitUpper LimitLower Limit
HexadecimalDecimalData Type
8/14/2019 Single Instruction Multiple Data (SIMD) and MMX Registers
9/14
Cannot mix FPU and MMX instructions Begin MMX instructions at any time
EMMS (Exit MMX Machine State) to reset FP state. After any MMX instruction, theentire floating-point tag word is set to Valid (00s). EMMS sets the entire floating-pointtag word to Empty (11s).
Register states (both FP and MMX) can be saved and restored by FNSAVE andFRSTR instructions.
Do not rely on register contents across transitions.FP_code:
...
MMX_code:...EMMS (*mark the FP tag word as empty*)
FP_code 1:......
8/14/2019 Single Instruction Multiple Data (SIMD) and MMX Registers
10/14
Instruction Group Fifty-seven MMX instructions:
Arithmetic Instructions
Comparison Instructions
Conversion Instructions Logical Instructions
Shift Instructions
Data Transfer Instructions Empty MMX State (EMMS) Instruction
8/14/2019 Single Instruction Multiple Data (SIMD) and MMX Registers
11/14
Category Mnemonic Different Opcodes DescriptionArithmetic PADD[B,W,D] 3 Add with wrap-around on [byte, word, doubleword]
PADDS[B,W] 2 Add signed with saturation on [byte, word]
PADDUS[B,W] 2 Add unsigned with saturation on [byte, word]
PSUB[B,W,D] 3 Subtract with wrap-around on [byte, word, doubleword]
PSUBS[B,W] 2 Subtract signed with saturation on [byte, word]
PSUBUS[B,W] 2 Subtract unsigned with saturation on [byte, word]PMULHW 1 Packed multiply high on words
PMULLW 1 Packed multiply low on wordsPMADDWD 1 Packed multiply on words and add resulting pairs
Comparison PCMPEQ[B,W,D] 3 Packed compare for equality [byte, word,doubleword]
PCMPGT[B,W,D] 3 Packed compare greater than [byte, word, doubleword]
Conversion PACKUSWB 1 Pack words into bytes (unsigned with saturation)
PACKSS[WB,DW] 2 Pack [words into bytes, doublewords into words] (signed with
saturation)PUNPCKH [BW,WD,DQ] 3 Unpack (interleave) high-order [bytes, words, doublewords] from
MMXTM register
PUNPCKL [BW,WD,DQ] 3 Unpack (interleave) low-order [bytes, words, doublewords] from
MMX register
Logical PAND 1 Bitwise AND
PANDN 1 Bitwise AND NOT
POR 1 Bitwise OR
PXOR 1 Bitwise XORShift PSLL[W,D,Q] 6 Packed shift left logical [word, doubleword, quadword] by amount
specified in MMX register or by immediate value
PSRL[W,D,Q] 6 Packed shift right logical [word, doubleword, quadword] by amount
specified in MMX register or by immediate value
PSRA[W,D] 4 Packed shift right arithmetic [word, doubleword] by amount
specified in MMX register or by immediate value
Data Transfer MOV[D,Q] 4 Move [doubleword, quadword] to MMX register or from MMX
registerState Mgmt EMMS 1 Empty MMX state
MMX Instruction Set
8/14/2019 Single Instruction Multiple Data (SIMD) and MMX Registers
12/14
Data Transfer Instructions The MOVD (Move 32 Bits) instruction transfers 32 bits of packed
data from memory to MMX registers and visa versa, or from integer
registers to MMX registers and visa versa. Examples: movd %eax, %mm0 movd my32bits, %mm0 movd %mm0, my32bits movd %mm0, %mm1 (WRONG!)
The MOVQ (Move 64 Bits) instruction transfers 64-bits of packeddata from memory to MMX registers and vise versa, or transfersdata between MMX registers. Examples:
movq %mm0, my64bits movq my64bits, %mm0
cant move between mmx regs, like load/store.
8/14/2019 Single Instruction Multiple Data (SIMD) and MMX Registers
13/14
Instruction format OPERATION SRC, DEST (AT&T syntax)
would be decoded as:DEST = DEST OPERATION SRC
A typical MMX instruction has this syntax:
Prefix: P for Packed
Instruction operation: for example - ADD,CMP,or XOR
Suffix: US for Unsigned Saturation
S for Signed saturation B, W, D, Q for the data type: packed byte, packed word, packed
doubleword, or quadword.
8/14/2019 Single Instruction Multiple Data (SIMD) and MMX Registers
14/14
The rest of todays class, explain MMX instructions on this page:
http://www.tommesani.com/MMXPrimer.html
Note the difference of Intel syntax and AT&T syntax
http://www.imada.sdu.dk/~kslarsen/dm18/Litteratur/IntelnATT.htm
This page uses Intel syntax, and the position of source and destinationin instructions are exchanged compared to AT&T syntax.
The pseudo-code explanation of each instruction is the same.
You may also want to refer to Intel official MMX reference manual forbetter explanation (also Intel syntax):
ftp://download.intel.com/ids/mmx/MMX_Manual_%20Prog_Ref.pdf
Examples and applications of MMX instructions will be on next class.