The x86 instruction set refers to the set of instructions that x86compatible microprocessors support. The instructions are usually part of an executable program, often stored as a computer file and executed on the processor.
The x86 instruction set has been extended several times, introducing wider registers and datatypes as well as new functionality.^{[1]}
x86 integer instructions
This is the full 8086/8088 instruction set of Intel. Most if not all of these instructions are available in 32bit mode; they just operate on 32bit registers (eax, ebx, etc.) and values instead of their 16bit (ax, bx, etc.) counterparts. See also x86 assembly language for a quick tutorial for this processor family. The updated instruction set is also grouped according to architecture (i386, i486, i686) and more generally is referred to as x86 32 and x86 64 (also known as AMD64).
Original 8086/8088 instructions
Instruction  Meaning  Notes  Opcode 

AAA  ASCII adjust AL after addition  used with unpacked binary coded decimal  0x37 
AAD  ASCII adjust AX before division  8086/8088 datasheet documents only base 10 version of the AAD instruction (opcode 0xD5 0x0A), but any other base will work. Later Intel's documentation has the generic form too. NEC V20 and V30 (and possibly other NEC Vseries CPUs) always use base 10, and ignore the argument, causing a number of incompatibilities  0xD5 
AAM  ASCII adjust AX after multiplication  Only base 10 version (Operand is 0xA) is documented, see notes for AAD  0xD4 
AAS  ASCII adjust AL after subtraction  0x3F  
ADC  Add with carry  destination := destination + source + carry flag 
0x10…0x15, 0x80/2…0x83/2 
ADD  Add  (1) r/m += r/imm; (2) r += m/imm; 
0x00…0x05, 0x80/0…0x83/0 
AND  Logical AND  (1) r/m &= r/imm; (2) r &= m/imm; 
0x20…0x25, 0x80/4…0x83/4 
CALL  Call procedure  push eip; eip points to the instruction directly after the call 
0x9A, 0xE8, 0xFF/2, 0xFF/3 
CBW  Convert byte to word  0x98  
CLC  Clear carry flag  CF = 0; 
0xF8 
CLD  Clear direction flag  DF = 0; 
0xFC 
CLI  Clear interrupt flag  IF = 0; 
0xFA 
CMC  Complement carry flag  0xF5  
CMP  Compare operands  0x38…0x3D, 0x80/7…0x83/7  
CMPSB  Compare bytes in memory  0xA6  
CMPSW  Compare words  0xA7  
CWD  Convert word to doubleword  0x99  
DAA  Decimal adjust AL after addition  (used with packed binary coded decimal)  0x27 
DAS  Decimal adjust AL after subtraction  0x2F  
DEC  Decrement by 1  0x48…0x4F, 0xFE/1, 0xFF/1  
DIV  Unsigned divide  DX:AX = DX:AX / r/m; resulting DX == remainder 
0xF6/6, 0xF7/6 
ESC  Used with floatingpoint unit  0xD8..0xDF  
HLT  Enter halt state  0xF4  
IDIV  Signed divide  DX:AX = DX:AX / r/m; resulting DX == remainder 
0xF6/7, 0xF7/7 
IMUL  Signed multiply  (1) DX:AX = AX * r/m; (2) AX = AL * r/m 
0x69, 0x6B (both since 80186), 0xF6/5, 0xF7/5, 0x0FAF (since 80386) 
IN  Input from port  (1) AL = port[imm]; (2) AL = port[DX]; (3) AX = port[imm]; (4) AX = port[DX]; 
0xE4, 0xE5, 0xEC, 0xED 
INC  Increment by 1  0x40…0x47, 0xFE/0, 0xFF/0  
INT  Call to interrupt  0xCC, 0xCD  
INTO  Call to interrupt if overflow  0xCE  
IRET  Return from interrupt  0xCF  
Jcc  Jump if condition  (JA, JAE, JB, JBE, JC, JE, JG, JGE, JL, JLE, JNA, JNAE, JNB, JNBE, JNC, JNE, JNG, JNGE, JNL, JNLE, JNO, JNP, JNS, JNZ, JO, JP, JPE, JPO, JS, JZ)  0x70…0x7F, 0x0F80…0x0F8F (since 80386) 
JCXZ  Jump if CX is zero  0xE3  
JMP  Jump  0xE9…0xEB, 0xFF/4, 0xFF/5  
LAHF  Load FLAGS into AH register  0x9F  
LDS  Load pointer using DS  0xC5  
LEA  Load Effective Address  0x8D  
LES  Load ES with pointer  0xC4  
LOCK  Assert BUS LOCK# signal  (for multiprocessing)  0xF0 
LODSB  Load string byte  if (DF==0) AL = *SI++; else AL = *SI; 
0xAC 
LODSW  Load string word  if (DF==0) AX = *SI++; else AX = *SI; 
0xAD 
LOOP/LOOPx  Loop control  (LOOPE, LOOPNE, LOOPNZ, LOOPZ) if (x && CX) goto lbl; 
0xE0…0xE2 
MOV  Move  copies data from one location to another, (1) r/m = r; (2) r = r/m; 
0xA0...0xA3 
MOVSB  Move byte from string to string  if (DF==0) *(byte*)DI++ = *(byte*)SI++; else *(byte*)DI = *(byte*)SI; 
0xA4 
MOVSW  Move word from string to string  if (DF==0) *(word*)DI++ = *(word*)SI++; else *(word*)DI = *(word*)SI; 
0xA5 
MUL  Unsigned multiply  (1) DX:AX = AX * r/m; (2) AX = AL * r/m; 
0xF6/4…0xF7/4 
NEG  Two's complement negation  r/m *= 1; 
0xF6/3…0xF7/3 
NOP  No operation  opcode equivalent to XCHG EAX, EAX 
0x90 
NOT  Negate the operand, logical NOT  r/m ^= 1; 
0xF6/2…0xF7/2 
OR  Logical OR  (1) r/m = r/imm; (2) r = m/imm; 
0x08…0x0D, 0x80…0x83/1 
OUT  Output to port  (1) port[imm] = AL; (2) port[DX] = AL; (3) port[imm] = AX; (4) port[DX] = AX; 
0xE6, 0xE7, 0xEE, 0xEF 
POP  Pop data from stack  r/m = *SP++; POP CS (opcode 0x0F) works only on 8086/8088. Later CPUs use 0x0F as a prefix for newer instructions. 
0x07, 0x0F(8086/8088 only), 0x17, 0x1F, 0x58…0x5F, 0x8F/0 
POPF  Pop FLAGS register from stack  FLAGS = *SP++; 
0x9D 
PUSH  Push data onto stack  *SP = r/m; 
0x06, 0x0E, 0x16, 0x1E, 0x50…0x57, 0x68, 0x6A (both since 80186), 0xFF/6 
PUSHF  Push FLAGS onto stack  *SP = FLAGS; 
0x9C 
RCL  Rotate left (with carry)  0xC0…0xC1/2 (since 80186), 0xD0…0xD3/2  
RCR  Rotate right (with carry)  0xC0…0xC1/3 (since 80186), 0xD0…0xD3/3  
REPxx  Repeat MOVS/STOS/CMPS/LODS/SCAS  (REP, REPE, REPNE, REPNZ, REPZ)  0xF2, 0xF3 
RET  Return from procedure  Not a real instruction. The assembler will translate these to a RETN or a RETF depending on the memory model of the target system.  
RETN  Return from near procedure  0xC2, 0xC3  
RETF  Return from far procedure  0xCA, 0xCB  
ROL  Rotate left  0xC0…0xC1/0 (since 80186), 0xD0…0xD3/0  
ROR  Rotate right  0xC0…0xC1/1 (since 80186), 0xD0…0xD3/1  
SAHF  Store AH into FLAGS  0x9E  
SAL  Shift Arithmetically left (signed shift left)  (1) r/m <<= 1; (2) r/m <<= CL; 
0xC0…0xC1/4 (since 80186), 0xD0…0xD3/4 
SAR  Shift Arithmetically right (signed shift right)  (1) (signed) r/m >>= 1; (2) (signed) r/m >>= CL; 
0xC0…0xC1/7 (since 80186), 0xD0…0xD3/7 
SBB  Subtraction with borrow  alternative 1byte encoding of SBB AL, AL is available via undocumented SALC instruction 
0x18…0x1D, 0x80…0x83/3 
SCASB  Compare byte string  0xAE  
SCASW  Compare word string  0xAF  
SHL  Shift left (unsigned shift left)  0xC0…0xC1/4 (since 80186), 0xD0…0xD3/4  
SHR  Shift right (unsigned shift right)  0xC0…0xC1/5 (since 80186), 0xD0…0xD3/5  
STC  Set carry flag  CF = 1; 
0xF9 
STD  Set direction flag  DF = 1; 
0xFD 
STI  Set interrupt flag  IF = 1; 
0xFB 
STOSB  Store byte in string  if (DF==0) *ES:DI++ = AL; else *ES:DI = AL; 
0xAA 
STOSW  Store word in string  if (DF==0) *ES:DI++ = AX; else *ES:DI = AX; 
0xAB 
SUB  Subtraction  (1) r/m = r/imm; (2) r = m/imm; 
0x28…0x2D, 0x80…0x83/5 
TEST  Logical compare (AND)  (1) r/m & r/imm; (2) r & m/imm; 
0x84, 0x84, 0xA8, 0xA9, 0xF6/0, 0xF7/0 
WAIT  Wait until not busy  Waits until BUSY# pin is inactive (used with floatingpoint unit)  0x9B 
XCHG  Exchange data  r :=: r/m; A spinlock typically uses xchg as an atomic operation. (coma bug). 
0x86, 0x87, 0x91…0x97 
XLAT  Table lookup translation  behaves like MOV AL, [BX+AL] 
0xD7 
XOR  Exclusive OR  (1) r/m ^= r/imm; (2) r ^= m/imm; 
0x30…0x35, 0x80…0x83/6 
Added in specific processors
Added with 80186/80188
Instruction  Meaning  Notes 

BOUND  Check array index against bounds  raises software interrupt 5 if test fails 
ENTER  Enter stack frame  Modifies stack for entry to procedure for high level language. Takes two operands: the amount of storage to be allocated on the stack and the nesting level of the procedure. 
INS  Input from port to string  equivalent to
IN (E)AX, DX MOV ES:[(E)DI], (E)AX ; adjust (E)DI according to operand size and DF 
LEAVE  Leave stack frame  Releases the local stack storage created by the previous ENTER instruction. 
OUTS  Output string to port  equivalent to
MOV (E)AX, DS:[(E)SI] OUT DX, (E)AX ; adjust (E)SI according to operand size and DF 
POPA  Pop all general purpose registers from stack  equivalent to
POP DI POP SI POP BP POP AX ; no POP SP here, all it does is ADD SP, 2 (since AX will be overwritten later) POP BX POP DX POP CX POP AX 
PUSHA  Push all general purpose registers onto stack  equivalent to
PUSH AX PUSH CX PUSH DX PUSH BX PUSH SP ; The value stored is the initial SP value PUSH BP PUSH SI PUSH DI 
PUSH immediate  Push an immediate byte/word value onto the stack  equivalent to
PUSH 12h PUSH 1200h 
IMUL immediate  Signed multiplication of immediate byte/word value  equivalent to
IMUL BX,12h IMUL DX,1200h IMUL CX, DX, 12h IMUL BX, SI, 1200h IMUL DI, word ptr [BX+SI], 12h IMUL SI, word ptr [BP4], 1200h 
SHL/SHR/SAL/SAR/ROL/ROR/RCL/RCR immediate  Rotate/shift bits with an immediate value greater than 1  equivalent to
ROL AX,3 SHR BL,3 
Added with 80286
Instruction  Meaning  Notes 

ARPL  Adjust RPL field of selector  
CLTS  Clear taskswitched flag in register CR0  
LAR  Load access rights byte  
LGDT  Load global descriptor table  
LIDT  Load interrupt descriptor table  
LLDT  Load local descriptor table  
LMSW  Load machine status word  
LOADALL  Load all CPU registers, including internal ones such as GDT  Undocumented, 80286 and 80386 only 
LSL  Load segment limit  
LTR  Load task register  
SGDT  Store global descriptor table  
SIDT  Store interrupt descriptor table  
SLDT  Store local descriptor table  
SMSW  Store machine status word  
STR  Store task register  
VERR  Verify a segment for reading  
VERW  Verify a segment for writing 
Added with 80386
Instruction  Meaning  Notes 

BSF  Bit scan forward  
BSR  Bit scan reverse  
BT  Bit test  
BTC  Bit test and complement  
BTR  Bit test and reset  
BTS  Bit test and set  
CDQ  Convert doubleword to quadword  Signextends EAX into EDX, forming the quadword EDX:EAX. Since (I)DIV uses EDX:EAX as its input, CDQ must be called after setting EAX if EDX is not manually initialized (as in 64/32 division) before (I)DIV. 
CMPSD  Compare string doubleword  Compares ES:[(E)DI] with DS:[(E)SI] and increments or decrements both (E)DI and (E)SI, depending on DF; can be prefixed with REP 
CWDE  Convert word to doubleword  Unlike CWD, CWDE signextends AX to EAX instead of AX to DX:AX 
IBTS  Insert Bit String  discontinued with B1 step of 80386 
INSD  Input from port to string doubleword  
IRETx  Interrupt return; D suffix means 32bit return, F suffix means do not generate epilogue code (i.e. LEAVE instruction)  Use IRETD rather than IRET in 32bit situations 
JECXZ  Jump if ECX is zero  
LFS, LGS  Load far pointer  
LSS  Load stack segment  
LODSD  Load string doubleword  EAX = *ES:EDI±±; (±± depends on DF, ES cannot be overridden); can be prefixed with REP

LOOPW, LOOPccW  Loop, conditional loop  Same as LOOP, LOOPcc for earlier processors 
LOOPD, LOOPccD  Loop while equal  if (cc && ECX) goto lbl; , cc = Z(ero), E(qual), NonZero, N(on)E(qual)

MOV to/from CR/DR/TR  Move to/from special registers  CR=control registers, DR=debug registers, TR=test registers (up to 80486) 
MOVSD  Move string doubleword  *(dword*)ES:EDI±± = *(dword*)ESI±±; (±± depends on DF); can be prefixed with REP

MOVSX  Move with signextension  (long)r = (signed char) r/m; and similar

MOVZX  Move with zeroextension  (long)r = (unsigned char) r/m; and similar

OUTSD  Output to port from string doubleword  port[DX] = *(long*)ESI±±; (±± depends on DF)

POPAD  Pop all doubleword (32bit) registers from stack  Does not pop register ESP off of stack 
POPFD  Pop data into EFLAGS register  
PUSHAD  Push all doubleword (32bit) registers onto stack  
PUSHFD  Push EFLAGS register onto stack  
SCASD  Scan string data doubleword  Compares ES:[(E)DI] with EAX and increments or decrements (E)DI, depending on DF; can be prefixed with REP 
SETcc  Set byte to one on condition, zero otherwise  (SETA, SETAE, SETB, SETBE, SETC, SETE, SETG, SETGE, SETL, SETLE, SETNA, SETNAE, SETNB, SETNBE, SETNC, SETNE, SETNG, SETNGE, SETNL, SETNLE, SETNO, SETNP, SETNS, SETNZ, SETO, SETP, SETPE, SETPO, SETS, SETZ) 
SHLD  Shift left doubleword  
SHRD  Shift right doubleword  r1 = r1>>CL ∣ r2<<(32CL); Instead of CL, immediate 1 can be used

STOSD  Store string doubleword  *ES:EDI±± = EAX; (±± depends on DF, ES cannot be overridden); can be prefixed with REP

XBTS  Extract Bit String  discontinued with B1 step of 80386 
Added with 80486
Instruction  Meaning  Notes 

BSWAP  Byte Swap  r = r<<24  r<<8&0x00FF0000  r>>8&0x0000FF00  r>>24;Only defined for 32bit registers. Usually used to change between little endian and big endian representations. When used with 16bit registers produces various different results on 486,^{[2]} 586, and Bochs/QEMU.^{[3]} 
CMPXCHG  atomic CoMPare and eXCHanGe  See Compareandswap / on later 80386 as undocumented opcode available 
INVD  Invalidate Internal Caches  Flush internal caches 
INVLPG  Invalidate TLB Entry  Invalidate TLB Entry for page that contains data specified 
WBINVD  Write Back and Invalidate Cache  Writes back all modified cache lines in the processor's internal cache to main memory and invalidates the internal caches. 
XADD  eXchange and ADD  Exchanges the first operand with the second operand, then loads the sum of the two values into the destination operand. 
Added with Pentium
Instruction  Meaning  Notes 

CPUID  CPU IDentification  Returns data regarding processor identification and features, and returns data to the EAX, EBX, ECX, and EDX registers. Instruction functions specified by the EAX register.^{[1]} This was also added to later 80486 processors 
CMPXCHG8B  CoMPare and eXCHanGe 8 bytes  Compare EDX:EAX with m64. If equal, set ZF and load ECX:EBX into m64. Else, clear ZF and load m64 into EDX:EAX. 
RDMSR  ReaD from Modelspecific register  Load MSR specified by ECX into EDX:EAX 
RDTSC  ReaD Time Stamp Counter  Returns the number of processor ticks since the processor being "ONLINE" (since the last power on of system) 
WRMSR  WRite to ModelSpecific Register  Write the value in EDX:EAX to MSR specified by ECX 
RSM^{[4]}  Resume from System Management Mode  This was introduced by the i386SL and later and is also in the i486SL and later. Resumes from System Management Mode (SMM) 
Added with Pentium MMX
Instruction  Meaning  Notes 

RDPMC  Read the PMC [Performance Monitoring Counter]  Specified in the ECX register into registers EDX:EAX 
Also MMX registers and MMX support instructions were added. They are usable for both integer and floating point operations, see below.
Added with AMD K6
Instruction  Meaning  Notes 

SYSCALL  functionally equivalent to SYSENTER  
SYSRET  functionally equivalent to SYSEXIT 
AMD changed the CPUID detection bit for this feature from the K6II on.
Added with Pentium Pro
Instruction  Meaning  Notes 

CMOVcc  Conditional move  (CMOVA, CMOVAE, CMOVB, CMOVBE, CMOVC, CMOVE, CMOVG, CMOVGE, CMOVL, CMOVLE, CMOVNA, CMOVNAE, CMOVNB, CMOVNBE, CMOVNC, CMOVNE, CMOVNG, CMOVNGE, CMOVNL, CMOVNLE, CMOVNO, CMOVNP, CMOVNS, CMOVNZ, CMOVO, CMOVP, CMOVPE, CMOVPO, CMOVS, CMOVZ) 
UD2  Undefined Instruction  Generates an invalid opcode. This instruction is provided for software testing to explicitly generate an invalid opcode. The opcode for this instruction is reserved for this purpose. 
Added with Pentium II
Instruction  Meaning  Notes 

SYSENTER  SYStem call ENTER  Sometimes called the Fast System Call instruction, this instruction was intended to increase the performance of operating system calls. Note that on the Pentium Pro, the CPUID instruction incorrectly reports these instructions as available. 
SYSEXIT  SYStem call EXIT 
Added with SSE
Instruction  Opcode  Meaning  Notes 

NOP r/m16  0F 1F /0  Multibyte nooperation instruction.  
NOP r/m32  
PREFETCHT0  0F 18 /1  Prefetch Data from Address  Prefetch into all cache levels 
PREFETCHT1  0F 18 /2  Prefetch Data from Address  Prefetch into all cache levels EXCEPT^{[5]}^{[6]} L1 
PREFETCHT2  0F 18 /3  Prefetch Data from Address  Prefetch into all cache levels EXCEPT L1 and L2 
PREFETCHNTA  0F 18 /0  Prefetch Data from Address  Prefetch to nontemporal cache structure, minimizing cache pollution. 
SFENCE  0F AE F8  Store Fence  Processor hint to make sure all store operations that took place prior to the SFENCE call are globally visible 
Added with SSE2
Instruction  Opcode  Meaning  Notes 

CLFLUSH m8  0F AE /7  Cache Line Flush  Invalidates the cache line that contains the linear address specified with the source operand from all levels of the processor cache hierarchy 
LFENCE  0F AE E8  Load Fence  Serializes load operations. 
MFENCE  0F AE F0  Memory Fence  Performs a serializing operation on all load and store instructions that were issued prior the MFENCE instruction. 
MOVNTI m32, r32  0F C3 /r  Move Doubleword NonTemporal  Move doubleword from r32 to m32, minimizing pollution in the cache hierarchy. 
PAUSE  F3 90  Spin Loop Hint  Provides a hint to the processor that the following code is a spin loop, for cacheability 
Added with SSE3
Instruction  Meaning  Notes 

MONITOR EAX, ECX, EDX 
Setup Monitor Address  Sets up a linear address range to be monitored by hardware and activates the monitor. 
MWAIT EAX, ECX 
Monitor Wait  Processor hint to stop instruction execution and enter an implementationdependent optimized state until occurrence of a class of events. 
Added with SSE4.2
Instruction  Opcode  Meaning  Notes 

CRC32 r32, r/m8  F2 0F 38 F0 /r  Accumulate CRC32  Computes CRC value using the CRC32C (Castagnoli) polynomial 0x11EDC6F41 (normal form 0x1EDC6F41). This is the polynomial used in iSCSI. In contrast to the more popular one used in Ethernet, its parity is even, and it can thus detect any error with an odd number of changed bits. 
CRC32 r32, r/m8  F2 REX 0F 38 F0 /r  
CRC32 r32, r/m16  F2 0F 38 F1 /r  
CRC32 r32, r/m32  F2 0F 38 F1 /r  
CRC32 r64, r/m8  F2 REX.W 0F 38 F0 /r  
CRC32 r64, r/m64  F2 REX.W 0F 38 F1 /r  
CRC32 r32, r/m8  F2 0F 38 F0 /r 
Added with x8664
Instruction  Meaning  Notes 

CDQE  Sign extend EAX into RAX  
CQO  Sign extend RAX into RDX:RAX  
CMPSQ  CoMPare String Quadword  
CMPXCHG16B  CoMPare and eXCHanGe 16 Bytes  
IRETQ  64bit Return from Interrupt  
JRCXZ  Jump if RCX is zero  
LODSQ  LoaD String Quadword  
MOVSXD  MOV with Sign Extend 32bit to 64bit  
POPFQ  POP RFLAGS Register  
PUSHFQ  PUSH RFLAGS Register  
RDTSCP  ReaD Time Stamp Counter and Processor ID  
SCASQ  SCAn String Quadword  
STOSQ  STOre String Quadword  
SWAPGS  Exchange GS base with KernelGSBase MSR 
Added with AMDV
Instruction  Meaning  Notes  Opcode 

CLGI  Clear Global Interrupt Flag  Clears the GIF  0x0F 0x01 0xDD 
INVLPGA  Invalidate TLB entry in a specified ASID  Invalidates the TLB mapping for the virtual page specified in RAX and the ASID specified in ECX.  0x0F 0x01 0xDF 
MOV(CRn)  Move to or from control registers  Moves 32 or 64bit contents to control register and vice versa.  0x0F 0x22 or 0x0F 0x20 
MOV(DRn)  Move to or from debug registers  Moves 32 or 64bit contents to control register and vice versa.  0x0F 0x21 or 0x0F 0x23 
SKINIT  Secure Init and Jump with Attestation  Verifiable startup of trusted software based on secure hash comparison  0x0F 0x01 0xDE 
STGI  Set Global Interrupt Flag  Sets the GIF.  0x0F 0x01 0xDC 
VMLOAD  Load state From VMCB  Loads a subset of processor state from the VMCB specified by the physical address in the RAX register.  0x0F 0x01 0xDA 
VMMCALL  Call VMM  Used exclusively to communicate with VMM  0x0F 0x01 0xD9 
VMRUN  Run virtual machine  Performs a switch to the guest OS.  0x0F 0x01 0xD8 
VMSAVE  Save state To VMCB  Saves additional guest state to VMCB.  0x0F 0x01 0xDB 
Added with Intel VTx
Instruction  Meaning  Notes  Opcode 

INVEPT  Invalidate Translations Derived from EPT  Invalidates EPTderived entries in the TLBs and pagingstructure caches.  0x66 0x0F 0x38 0x80 
INVVPID  Invalidate Translations Based on VPID  Invalidates entries in the TLBs and pagingstructure caches based on VPID.  0x66 0x0F 0x38 0x80 
VMFUNC  Invoke VM function  Invoke VM function specified in EAX.  0x0F 0x01 0xD4 
VMPTRLD  Load Pointer to VirtualMachine Control Structure  Loads the current VMCS pointer from memory.  0x0F 0xC7/6 
VMPTRST  Store Pointer to VirtualMachine Control Structure  Stores the currentVMCS pointer into a specified memory address. The operand of this instruction is always 64 bits and is always in memory.  0x0F 0xC7/7 
VMCLEAR  Clear VirtualMachine Control Structure  Writes any cached data to the VMCS  0x66 0x0F 0xC7/6 
VMREAD  Read Field from VirtualMachine Control Structure  Reads out a field in the VMCS  0x0F 0x78 
VMWRITE  Write Field to VirtualMachine Control Structure  Modifies a field in the VMCS  0x0F 0x79 
VMCALL  Call to VM Monitor  Calls VM Monitor function from Guest System  0x0F 0x01 0xC1 
VMLAUNCH  Launch Virtual Machine  Launch virtual machine managed by current VMCS  0x0F 0x01 0xC2 
VMRESUME  Resume Virtual Machine  Resume virtual machine managed by current VMCS  0x0F 0x01 0xC3 
VMXOFF  Leave VMX Operation  Stops hardware supported virtualisation environment  0x0F 0x01 0xC4 
VMXON  Enter VMX Operation  Enters hardware supported virtualisation environment  0xF3 0x0F 0xC7/6 
Added with ABM
LZCNT, POPCNT (POPulation CouNT) – advanced bit manipulation
Added with BMI1
ANDN, BEXTR, BLSI, BLSMSK, BLSR, TZCNT
Added with BMI2
BZHI, MULX, PDEP, PEXT, RORX, SARX, SHRX, SHLX
Added with TBM
AMD introduced TBM together with BMI1 in its Piledriver^{[7]} line of processors; later AMD Jaguar and Zenbased processors do not support TBM.^{[8]} No Intel processors (as of 2020) support TBM.
Instruction  Description^{[9]}  Equivalent C expression^{[10]} 

BEXTR  Bit field extract (with immediate)  (src >> start) & ((1 << len)  1) 
BLCFILL  Fill from lowest clear bit  x & (x + 1) 
BLCI  Isolate lowest clear bit  x  ~(x + 1) 
BLCIC  Isolate lowest clear bit and complement  ~x & (x + 1) 
BLCMSK  Mask from lowest clear bit  x ^ (x + 1) 
BLCS  Set lowest clear bit  x  (x + 1) 
BLSFILL  Fill from lowest set bit  x  (x  1) 
BLSIC  Isolate lowest set bit and complement  ~x  (x  1) 
T1MSKC  Inverse mask from trailing ones  ~x  (x + 1) 
TZMSK  Mask from trailing zeros  ~x & (x  1) 
Added with CLMUL instruction set
Instruction  Opcode  Description 

PCLMULQDQ xmmreg,xmmrm,imm  66 0f 3a 44 /r ib  Perform a carryless multiplication of two 64bit polynomials over the finite field GF(2^{k}). 
PCLMULLQLQDQ xmmreg,xmmrm  66 0f 3a 44 /r 00  Multiply the low halves of the two registers. 
PCLMULHQLQDQ xmmreg,xmmrm  66 0f 3a 44 /r 01  Multiply the high half of the destination register by the low half of the source register. 
PCLMULLQHQDQ xmmreg,xmmrm  66 0f 3a 44 /r 10  Multiply the low half of the destination register by the high half of the source register. 
PCLMULHQHQDQ xmmreg,xmmrm  66 0f 3a 44 /r 11  Multiply the high halves of the two registers. 
Added with Intel ADX
Instruction  Description 

ADCX  Adds two unsigned integers plus carry, reading the carry from the carry flag and if necessary setting it there. Does not affect other flags than the carry. 
ADOX  Adds two unsigned integers plus carry, reading the carry from the overflow flag and if necessary setting it there. Does not affect other flags than the overflow. 
x87 floatingpoint instructions
Original 8087 instructions
Instruction  Meaning  Notes 

F2XM1  [math]\displaystyle{ 2^x  1 }[/math]  more precise than [math]\displaystyle{ 2^x }[/math] for x close to zero 
FABS  Absolute value  
FADD  Add  
FADDP  Add and pop  
FBLD  Load BCD  
FBSTP  Store BCD and pop  
FCHS  Change sign  
FCLEX  Clear exceptions  
FCOM  Compare  
FCOMP  Compare and pop  
FCOMPP  Compare and pop twice  
FDECSTP  Decrement floating point stack pointer  
FDISI  Disable interrupts  8087 only, otherwise FNOP 
FDIV  Divide  Pentium FDIV bug 
FDIVP  Divide and pop  
FDIVR  Divide reversed  
FDIVRP  Divide reversed and pop  
FENI  Enable interrupts  8087 only, otherwise FNOP 
FFREE  Free register  
FIADD  Integer add  
FICOM  Integer compare  
FICOMP  Integer compare and pop  
FIDIV  Integer divide  
FIDIVR  Integer divide reversed  
FILD  Load integer  
FIMUL  Integer multiply  
FINCSTP  Increment floating point stack pointer  
FINIT  Initialize floating point processor  
FIST  Store integer  
FISTP  Store integer and pop  
FISUB  Integer subtract  
FISUBR  Integer subtract reversed  
FLD  Floating point load  
FLD1  Load 1.0 onto stack  
FLDCW  Load control word  
FLDENV  Load environment state  
FLDENVW  Load environment state, 16bit  
FLDL2E  Load log_{2}(e) onto stack  
FLDL2T  Load log_{2}(10) onto stack  
FLDLG2  Load log_{10}(2) onto stack  
FLDLN2  Load ln(2) onto stack  
FLDPI  Load π onto stack  
FLDZ  Load 0.0 onto stack  
FMUL  Multiply  
FMULP  Multiply and pop  
FNCLEX  Clear exceptions, no wait  
FNDISI  Disable interrupts, no wait  8087 only, otherwise FNOP 
FNENI  Enable interrupts, no wait  8087 only, otherwise FNOP 
FNINIT  Initialize floating point processor, no wait  
FNOP  No operation  
FNSAVE  Save FPU state, no wait, 8bit  
FNSAVEW  Save FPU state, no wait, 16bit  
FNSTCW  Store control word, no wait  
FNSTENV  Store FPU environment, no wait  
FNSTENVW  Store FPU environment, no wait, 16bit  
FNSTSW  Store status word, no wait  
FPATAN  Partial arctangent  
FPREM  Partial remainder  
FPTAN  Partial tangent  
FRNDINT  Round to integer  
FRSTOR  Restore saved state  
FRSTORW  Restore saved state  Perhaps not actually available in 8087 
FSAVE  Save FPU state  
FSAVEW  Save FPU state, 16bit  
FSCALE  Scale by factor of 2  
FSQRT  Square root  
FST  Floating point store  
FSTCW  Store control word  
FSTENV  Store FPU environment  
FSTENVW  Store FPU environment, 16bit  
FSTP  Store and pop  
FSTSW  Store status word  
FSUB  Subtract  
FSUBP  Subtract and pop  
FSUBR  Reverse subtract  
FSUBRP  Reverse subtract and pop  
FTST  Test for zero  
FWAIT  Wait while FPU is executing  
FXAM  Examine condition flags  
FXCH  Exchange registers  
FXTRACT  Extract exponent and significand  
FYL2X  y · log_{2} x  if y = log_{b} 2, then the baseb logarithm is computed 
FYL2XP1  y · log_{2} (x+1)  more precise than log_{2} z if x is close to zero 
Added in specific processors
Added with 80287
Instruction  Meaning  Notes 

FSETPM  Set protected mode  80287 only, otherwise FNOP 
Added with 80387
Instruction  Meaning  Notes 

FCOS  Cosine  
FLDENVD  Load environment state, 32bit  
FSAVED  Save FPU state, 32bit  
FPREM1  Partial remainder  Computes IEEE remainder 
FRSTORD  Restore saved state, 32bit  
FSIN  Sine  
FSINCOS  Sine and cosine  
FSTENVD  Store FPU environment, 32bit  
FUCOM  Unordered compare  
FUCOMP  Unordered compare and pop  
FUCOMPP  Unordered compare and pop twice 
Added with Pentium Pro
 FCMOV variants: FCMOVB, FCMOVBE, FCMOVE, FCMOVNB, FCMOVNBE, FCMOVNE, FCMOVNU, FCMOVU
 FCOMI variants: FCOMI, FCOMIP, FUCOMI, FUCOMIP
Added with SSE
FXRSTOR, FXSAVE
These are also supported on later Pentium IIs which do not contain SSE support
Added with SSE3
FISTTP (x87 to integer conversion with truncation regardless of status word)
SIMD instructions
MMX instructions
MMX instructions operate on the mm registers, which are 64 bits wide. They are shared with the FPU registers.
Original MMX instructions
Added with Pentium MMX
Instruction  Opcode  Meaning  Notes 

EMMS  0F 77  Empty MMX Technology State  Marks all x87 FPU registers for use by FPU 
MOVD mm, r/m32  0F 6E /r  Move doubleword  
MOVD r/m32, mm  0F 7E /r  Move doubleword  
MOVQ mm/m64, mm  0F 7F /r  Move quadword  
MOVQ mm, mm/m64  0F 6F /r  Move quadword  
MOVQ mm, r/m64  REX.W + 0F 6E /r  Move quadword  
MOVQ r/m64, mm  REX.W + 0F 7E /r  Move quadword  
PACKSSDW mm1, mm2/m64  0F 6B /r  Pack doublewords to words (signed with saturation)  
PACKSSWB mm1, mm2/m64  0F 63 /r  Pack words to bytes (signed with saturation)  
PACKUSWB mm, mm/m64  0F 67 /r  Pack words to bytes (unsigned with saturation)  
PADDB mm, mm/m64  0F FC /r  Add packed byte integers  
PADDW mm, mm/m64  0F FD /r  Add packed word integers  
PADDD mm, mm/m64  0F FE /r  Add packed doubleword integers  
PADDQ mm, mm/m64  0F D4 /r  Add packed quadword integers  
PADDSB mm, mm/m64  0F EC /r  Add packed signed byte integers and saturate  
PADDSW mm, mm/m64  0F ED /r  Add packed signed word integers and saturate  
PADDUSB mm, mm/m64  0F DC /r  Add packed unsigned byte integers and saturate  
PADDUSW mm, mm/m64  0F DD /r  Add packed unsigned word integers and saturate  
PAND mm, mm/m64  0F DB /r  Bitwise AND  
PANDN mm, mm/m64  0F DF /r  Bitwise AND NOT  
POR mm, mm/m64  0F EB /r  Bitwise OR  
PXOR mm, mm/m64  0F EF /r  Bitwise XOR  
PCMPEQB mm, mm/m64  0F 74 /r  Compare packed bytes for equality  
PCMPEQW mm, mm/m64  0F 75 /r  Compare packed words for equality  
PCMPEQD mm, mm/m64  0F 76 /r  Compare packed doublewords for equality  
PCMPGTB mm, mm/m64  0F 64 /r  Compare packed signed byte integers for greater than  
PCMPGTW mm, mm/m64  0F 65 /r  Compare packed signed word integers for greater than  
PCMPGTD mm, mm/m64  0F 66 /r  Compare packed signed doubleword integers for greater than  
PMADDWD mm, mm/m64  0F F5 /r  Multiply packed words, add adjacent doubleword results  
PMULHW mm, mm/m64  0F E5 /r  Multiply packed signed word integers, store high 16 bits of results  
PMULLW mm, mm/m64  0F D5 /r  Multiply packed signed word integers, store low 16 bits of results  
PSLLW mm1, imm8  0F 71 /6 ib  Shift left words, shift in zeros  
PSLLW mm, mm/m64  0F F1 /r  Shift left words, shift in zeros  
PSLLD mm, imm8  0F 72 /6 ib  Shift left doublewords, shift in zeros  
PSLLD mm, mm/m64  0F F2 /r  Shift left doublewords, shift in zeros  
PSLLQ mm, imm8  0F 73 /6 ib  Shift left quadword, shift in zeros  
PSLLQ mm, mm/m64  0F F3 /r  Shift left quadword, shift in zeros  
PSRAD mm, imm8  0F 72 /4 ib  Shift right doublewords, shift in sign bits  
PSRAD mm, mm/m64  0F E2 /r  Shift right doublewords, shift in sign bits  
PSRAW mm, imm8  0F 71 /4 ib  Shift right words, shift in sign bits  
PSRAW mm, mm/m64  0F E1 /r  Shift right words, shift in sign bits  
PSRLW mm, imm8  0F 71 /2 ib  Shift right words, shift in zeros  
PSRLW mm, mm/m64  0F D1 /r  Shift right words, shift in zeros  
PSRLD mm, imm8  0F 72 /2 ib  Shift right doublewords, shift in zeros  
PSRLD mm, mm/m64  0F D2 /r  Shift right doublewords, shift in zeros  
PSRLQ mm, imm8  0F 73 /2 ib  Shift right quadword, shift in zeros  
PSRLQ mm, mm/m64  0F D3 /r  Shift right quadword, shift in zeros  
PSUBB mm, mm/m64  0F F8 /r  Subtract packed byte integers  
PSUBW mm, mm/m64  0F F9 /r  Subtract packed word integers  
PSUBD mm, mm/m64  0F FA /r  Subtract packed doubleword integers  
PSUBSB mm, mm/m64  0F E8 /r  Subtract signed packed bytes with saturation  
PSUBSW mm, mm/m64  0F E9 /r  Subtract signed packed words with saturation  
PSUBUSB mm, mm/m64  0F D8 /r  Subtract unsigned packed bytes with saturation  
PSUBUSW mm, mm/m64  0F D9 /r  Subtract unsigned packed words with saturation  
PUNPCKHBW mm, mm/m64  0F 68 /r  Unpack and interleave highorder bytes  
PUNPCKHWD mm, mm/m64  0F 69 /r  Unpack and interleave highorder words  
PUNPCKHDQ mm, mm/m64  0F 6A /r  Unpack and interleave highorder doublewords  
PUNPCKLBW mm, mm/m32  0F 60 /r  Unpack and interleave loworder bytes  
PUNPCKLWD mm, mm/m32  0F 61 /r  Unpack and interleave loworder words  
PUNPCKLDQ mm, mm/m32  0F 62 /r  Unpack and interleave loworder doublewords 
MMX instructions added in specific processors
EMMI instructions
Added with 6x86MX from Cyrix, deprecated now
PAVEB, PADDSIW, PMAGW, PDISTIB, PSUBSIW, PMVZB, PMULHRW, PMVNZB, PMVLZB, PMVGEZB, PMULHRIW, PMACHRIW
MMX instructions added with MMX+ and SSE
The following MMX instruction were added with SSE. They are also available on the Athlon under the name MMX+.
Instruction  Opcode  Meaning 

MASKMOVQ mm1, mm2  0F F7 /r  Masked Move of Quadword 
MOVNTQ m64, mm  0F E7 /r  Move Quadword Using NonTemporal Hint 
PSHUFW mm1, mm2/m64, imm8  0F 70 /r ib  Shuffle Packed Words 
PINSRW mm, r32/m16, imm8  0F C4 /r  Insert Word 
PEXTRW reg, mm, imm8  0F C5 /r  Extract Word 
PMOVMSKB reg, mm  0F D7 /r  Move Byte Mask 
PMINUB mm1, mm2/m64  0F DA /r  Minimum of Packed Unsigned Byte Integers 
PMAXUB mm1, mm2/m64  0F DE /r  Maximum of Packed Unsigned Byte Integers 
PAVGB mm1, mm2/m64  0F E0 /r  Average Packed Integers 
PAVGW mm1, mm2/m64  0F E3 /r  Average Packed Integers 
PMULHUW mm1, mm2/m64  0F E4 /r  Multiply Packed Unsigned Integers and Store High Result 
PMINSW mm1, mm2/m64  0F EA /r  Minimum of Packed Signed Word Integers 
PMAXSW mm1, mm2/m64  0F EE /r  Maximum of Packed Signed Word Integers 
PSADBW mm1, mm2/m64  0F F6 /r  Compute Sum of Absolute Differences 
MMX instructions added with SSE2
The following MMX instructions were added with SSE2:
Instruction  Opcode  Meaning 

PSUBQ mm1, mm2/m64  0F FB /r  Subtract quadword integer 
PMULUDQ mm1, mm2/m64  0F F4 /r  Multiply unsigned doubleword integer 
MMX instructions added with SSSE3
Instruction  Opcode  Meaning 

PSIGNB mm1, mm2/m64  0F 38 08 /r  Negate/zero/preserve packed byte integers depending on corresponding sign 
PSIGNW mm1, mm2/m64  0F 38 09 /r  Negate/zero/preserve packed word integers depending on corresponding sign 
PSIGND mm1, mm2/m64  0F 38 0A /r  Negate/zero/preserve packed doubleword integers depending on corresponding sign 
PSHUFB mm1, mm2/m64  0F 38 00 /r  Shuffle bytes 
PMULHRSW mm1, mm2/m64  0F 38 0B /r  Multiply 16bit signed words, scale and round signed doublewords, pack high 16 bits 
PMADDUBSW mm1, mm2/m64  0F 38 04 /r  Multiply signed and unsigned bytes, add horizontal pair of signed words, pack saturated signedwords 
PHSUBW mm1, mm2/m64  0F 38 05 /r  Subtract and pack 16bit signed integers horizontally 
PHSUBSW mm1, mm2/m64  0F 38 07 /r  Subtract and pack 16bit signed integer horizontally with saturation 
PHSUBD mm1, mm2/m64  0F 38 06 /r  Subtract and pack 32bit signed integers horizontally 
PHADDSW mm1, mm2/m64  0F 38 03 /r  Add and pack 16bit signed integers horizontally, pack saturated integers to mm1. 
PHADDW mm1, mm2/m64  0F 38 01 /r  Add and pack 16bit integers horizontally 
PHADDD mm1, mm2/m64  0F 38 02 /r  Add and pack 32bit integers horizontally 
PALIGNR mm1, mm2/m64, imm8  0F 3A 0F /r ib  Concatenate destination and source operands, extract bytealigned result shifted to the right 
PABSB mm1, mm2/m64  0F 38 1C /r  Compute the absolute value of bytes and store unsigned result 
PABSW mm1, mm2/m64  0F 38 1D /r  Compute the absolute value of 16bit integers and store unsigned result 
PABSD mm1, mm2/m64  0F 38 1E /r  Compute the absolute value of 32bit integers and store unsigned result 
3DNow! instructions
Added with K62
FEMMS, PAVGUSB, PF2ID, PFACC, PFADD, PFCMPEQ, PFCMPGE, PFCMPGT, PFMAX, PFMIN, PFMUL, PFRCP, PFRCPIT1, PFRCPIT2, PFRSQIT1, PFRSQRT, PFSUB, PFSUBR, PI2FD, PMULHRW, PREFETCH, PREFETCHW
3DNow!+ instructions
Added with Athlon and K62+
PF2IW, PFNACC, PFPNACC, PI2FW, PSWAPD
Added with Geode GX
PFRSQRTV, PFRCPV
SSE instructions
Added with Pentium III
SSE instructions operate on xmm registers, which are 128 bit wide.
SSE consists of the following SSE SIMD floatingpoint instructions:
Instruction  Opcode  Meaning 

ANDPS* xmm1, xmm2/m128  0F 54 /r  Bitwise Logical AND of Packed SinglePrecision FloatingPoint Values 
ANDNPS* xmm1, xmm2/m128  0F 55 /r  Bitwise Logical AND NOT of Packed SinglePrecision FloatingPoint Values 
ORPS* xmm1, xmm2/m128  0F 56 /r  Bitwise Logical OR of SinglePrecision FloatingPoint Values 
XORPS* xmm1, xmm2/m128  0F 57 /r  Bitwise Logical XOR for SinglePrecision FloatingPoint Values 
MOVUPS xmm1, xmm2/m128  0F 10 /r  Move Unaligned Packed SinglePrecision FloatingPoint Values 
MOVSS xmm1, xmm2/m32  F3 0F 10 /r  Move Scalar SinglePrecision FloatingPoint Values 
MOVUPS xmm2/m128, xmm1  0F 11 /r  Move Unaligned Packed SinglePrecision FloatingPoint Values 
MOVSS xmm2/m32, xmm1  F3 0F 11 /r  Move Scalar SinglePrecision FloatingPoint Values 
MOVLPS xmm, m64  0F 12 /r  Move Low Packed SinglePrecision FloatingPoint Values 
MOVHLPS xmm1, xmm2  0F 12 /r  Move Packed SinglePrecision FloatingPoint Values High to Low 
MOVLPS m64, xmm  0F 13 /r  Move Low Packed SinglePrecision FloatingPoint Values 
UNPCKLPS xmm1, xmm2/m128  0F 14 /r  Unpack and Interleave Low Packed SinglePrecision FloatingPoint Values 
UNPCKHPS xmm1, xmm2/m128  0F 15 /r  Unpack and Interleave High Packed SinglePrecision FloatingPoint Values 
MOVHPS xmm, m64  0F 16 /r  Move High Packed SinglePrecision FloatingPoint Values 
MOVLHPS xmm1, xmm2  0F 16 /r  Move Packed SinglePrecision FloatingPoint Values Low to High 
MOVHPS m64, xmm  0F 17 /r  Move High Packed SinglePrecision FloatingPoint Values 
MOVAPS xmm1, xmm2/m128  0F 28 /r  Move Aligned Packed SinglePrecision FloatingPoint Values 
MOVAPS xmm2/m128, xmm1  0F 29 /r  Move Aligned Packed SinglePrecision FloatingPoint Values 
MOVNTPS m128, xmm1  0F 2B /r  Move Aligned Four Packed SingleFP Non Temporal 
MOVMSKPS reg, xmm  0F 50 /r  Extract Packed SinglePrecision FloatingPoint 4bit Sign Mask. The upper bits of the register are filled with zeros. 
CVTPI2PS xmm, mm/m64  0F 2A /r  Convert Packed Dword Integers to Packed SinglePrecision FP Values 
CVTSI2SS xmm, r/m32  F3 0F 2A /r  Convert Dword Integer to Scalar SinglePrecision FP Value 
CVTSI2SS xmm, r/m64  F3 REX.W 0F 2A /r  Convert Qword Integer to Scalar SinglePrecision FP Value 
MOVNTPS m128, xmm  0F 2B /r  Store Packed SinglePrecision FloatingPoint Values Using NonTemporal Hint 
CVTTPS2PI mm, xmm/m64  0F 2C /r  Convert with Truncation Packed SinglePrecision FP Values to Packed Dword Integers 
CVTTSS2SI r32, xmm/m32  F3 0F 2C /r  Convert with Truncation Scalar SinglePrecision FP Value to Dword Integer 
CVTTSS2SI r64, xmm1/m32  F3 REX.W 0F 2C /r  Convert with Truncation Scalar SinglePrecision FP Value to Qword Integer 
CVTPS2PI mm, xmm/m64  0F 2D /r  Convert Packed SinglePrecision FP Values to Packed Dword Integers 
CVTSS2SI r32, xmm/m32  F3 0F 2D /r  Convert Scalar SinglePrecision FP Value to Dword Integer 
CVTSS2SI r64, xmm1/m32  F3 REX.W 0F 2D /r  Convert Scalar SinglePrecision FP Value to Qword Integer 
UCOMISS xmm1, xmm2/m32  0F 2E /r  Unordered Compare Scalar SinglePrecision FloatingPoint Values and Set EFLAGS 
COMISS xmm1, xmm2/m32  0F 2F /r  Compare Scalar Ordered SinglePrecision FloatingPoint Values and Set EFLAGS 
SQRTPS xmm1, xmm2/m128  0F 51 /r  Compute Square Roots of Packed SinglePrecision FloatingPoint Values 
SQRTSS xmm1, xmm2/m32  F3 0F 51 /r  Compute Square Root of Scalar SinglePrecision FloatingPoint Value 
RSQRTPS xmm1, xmm2/m128  0F 52 /r  Compute Reciprocal of Square Root of Packed SinglePrecision FloatingPoint Value 
RSQRTSS xmm1, xmm2/m32  F3 0F 52 /r  Compute Reciprocal of Square Root of Scalar SinglePrecision FloatingPoint Value 
RCPPS xmm1, xmm2/m128  0F 53 /r  Compute Reciprocal of Packed SinglePrecision FloatingPoint Values 
RCPSS xmm1, xmm2/m32  F3 0F 53 /r  Compute Reciprocal of Scalar SinglePrecision FloatingPoint Values 
ADDPS xmm1, xmm2/m128  0F 58 /r  Add Packed SinglePrecision FloatingPoint Values 
ADDSS xmm1, xmm2/m32  F3 0F 58 /r  Add Scalar SinglePrecision FloatingPoint Values 
MULPS xmm1, xmm2/m128  0F 59 /r  Multiply Packed SinglePrecision FloatingPoint Values 
MULSS xmm1, xmm2/m32  F3 0F 59 /r  Multiply Scalar SinglePrecision FloatingPoint Values 
SUBPS xmm1, xmm2/m128  0F 5C /r  Subtract Packed SinglePrecision FloatingPoint Values 
SUBSS xmm1, xmm2/m32  F3 0F 5C /r  Subtract Scalar SinglePrecision FloatingPoint Values 
MINPS xmm1, xmm2/m128  0F 5D /r  Return Minimum Packed SinglePrecision FloatingPoint Values 
MINSS xmm1, xmm2/m32  F3 0F 5D /r  Return Minimum Scalar SinglePrecision FloatingPoint Values 
DIVPS xmm1, xmm2/m128  0F 5E /r  Divide Packed SinglePrecision FloatingPoint Values 
DIVSS xmm1, xmm2/m32  F3 0F 5E /r  Divide Scalar SinglePrecision FloatingPoint Values 
MAXPS xmm1, xmm2/m128  0F 5F /r  Return Maximum Packed SinglePrecision FloatingPoint Values 
MAXSS xmm1, xmm2/m32  F3 0F 5F /r  Return Maximum Scalar SinglePrecision FloatingPoint Values 
LDMXCSR m32  0F AE /2  Load MXCSR Register State 
STMXCSR m32  0F AE /3  Store MXCSR Register State 
CMPPS xmm1, xmm2/m128, imm8  0F C2 /r ib  Compare Packed SinglePrecision FloatingPoint Values 
CMPSS xmm1, xmm2/m32, imm8  F3 0F C2 /r ib  Compare Scalar SinglePrecision FloatingPoint Values 
SHUFPS xmm1, xmm2/m128, imm8  0F C6 /r ib  Shuffle Packed SinglePrecision FloatingPoint Values 
 The floating point single bitwise operations ANDPS, ANDNPS, ORPS and XORPS produce the same result as the SSE2 integer (PAND, PANDN, POR, PXOR) and double ones (ANDPD, ANDNPD, ORPD, XORPD), but can introduce extra latency for domain changes when applied values of the wrong type.^{[11]}
SSE2 instructions
Added with Pentium 4
SSE2 SIMD floatingpoint instructions
SSE2 data movement instructions
Instruction  Opcode  Meaning 

MOVAPD xmm1, xmm2/m128  66 0F 28 /r  Move Aligned Packed DoublePrecision FloatingPoint Values 
MOVAPD xmm2/m128, xmm1  66 0F 29 /r  Move Aligned Packed DoublePrecision FloatingPoint Values 
MOVNTPD m128, xmm1  66 0F 2B /r  Store Packed DoublePrecision FloatingPoint Values Using NonTemporal Hint 
MOVHPD xmm1, m64  66 0F 16 /r  Move High Packed DoublePrecision FloatingPoint Value 
MOVHPD m64, xmm1  66 0F 17 /r  Move High Packed DoublePrecision FloatingPoint Value 
MOVLPD xmm1, m64  66 0F 12 /r  Move Low Packed DoublePrecision FloatingPoint Value 
MOVLPD m64, xmm1  66 0F 13/r  Move Low Packed DoublePrecision FloatingPoint Value 
MOVUPD xmm1, xmm2/m128  66 0F 10 /r  Move Unaligned Packed DoublePrecision FloatingPoint Values 
MOVUPD xmm2/m128, xmm1  66 0F 11 /r  Move Unaligned Packed DoublePrecision FloatingPoint Values 
MOVMSKPD reg, xmm  66 0F 50 /r  Extract Packed DoublePrecision FloatingPoint Sign Mask 
MOVSD* xmm1, xmm2/m64  F2 0F 10 /r  Move or Merge Scalar DoublePrecision FloatingPoint Value 
MOVSD xmm1/m64, xmm2  F2 0F 11 /r  Move or Merge Scalar DoublePrecision FloatingPoint Value 
SSE2 packed arithmetic instructions
Instruction  Opcode  Meaning 

ADDPD xmm1, xmm2/m128  66 0F 58 /r  Add Packed DoublePrecision FloatingPoint Values 
ADDSD xmm1, xmm2/m64  F2 0F 58 /r  Add Low DoublePrecision FloatingPoint Value 
DIVPD xmm1, xmm2/m128  66 0F 5E /r  Divide Packed DoublePrecision FloatingPoint Values 
DIVSD xmm1, xmm2/m64  F2 0F 5E /r  Divide Scalar DoublePrecision FloatingPoint Value 
MAXPD xmm1, xmm2/m128  66 0F 5F /r  Maximum of Packed DoublePrecision FloatingPoint Values 
MAXSD xmm1, xmm2/m64  F2 0F 5F /r  Return Maximum Scalar DoublePrecision FloatingPoint Value 
MINPD xmm1, xmm2/m128  66 0F 5D /r  Minimum of Packed DoublePrecision FloatingPoint Values 
MINSD xmm1, xmm2/m64  F2 0F 5D /r  Return Minimum Scalar DoublePrecision FloatingPoint Value 
MULPD xmm1, xmm2/m128  66 0F 59 /r  Multiply Packed DoublePrecision FloatingPoint Values 
MULSD xmm1,xmm2/m64  F2 0F 59 /r  Multiply Scalar DoublePrecision FloatingPoint Value 
SQRTPD xmm1, xmm2/m128  66 0F 51 /r  Square Root of DoublePrecision FloatingPoint Values 
SQRTSD xmm1,xmm2/m64  F2 0F 51/r  Compute Square Root of Scalar DoublePrecision FloatingPoint Value 
SUBPD xmm1, xmm2/m128  66 0F 5C /r  Subtract Packed DoublePrecision FloatingPoint Values 
SUBSD xmm1, xmm2/m64  F2 0F 5C /r  Subtract Scalar DoublePrecision FloatingPoint Value 
SSE2 logical instructions
Instruction  Opcode  Meaning 

ANDPD xmm1, xmm2/m128  66 0F 54 /r  Bitwise Logical AND of Packed Double Precision FloatingPoint Values 
ANDNPD xmm1, xmm2/m128  66 0F 55 /r  Bitwise Logical AND NOT of Packed Double Precision FloatingPoint Values 
ORPD xmm1, xmm2/m128  66 0F 56/r  Bitwise Logical OR of Packed Double Precision FloatingPoint Values 
XORPD xmm1, xmm2/m128  66 0F 57/r  Bitwise Logical XOR of Packed Double Precision FloatingPoint Values 
SSE2 compare instructions
Instruction  Opcode  Meaning 

CMPPD xmm1, xmm2/m128, imm8  66 0F C2 /r ib  Compare Packed DoublePrecision FloatingPoint Values 
CMPSD* xmm1, xmm2/m64, imm8  F2 0F C2 /r ib  Compare Low DoublePrecision FloatingPoint Values 
COMISD xmm1, xmm2/m64  66 0F 2F /r  Compare Scalar Ordered DoublePrecision FloatingPoint Values and Set EFLAGS 
UCOMISD xmm1, xmm2/m64  66 0F 2E /r  Unordered Compare Scalar DoublePrecision FloatingPoint Values and Set EFLAGS 
SSE2 shuffle and unpack instructions
Instruction  Opcode  Meaning 

SHUFPD xmm1, xmm2/m128, imm8  66 0F C6 /r ib  Packed Interleave Shuffle of Pairs of DoublePrecision FloatingPoint Values 
UNPCKHPD xmm1, xmm2/m128  66 0F 15 /r  Unpack and Interleave High Packed DoublePrecision FloatingPoint Values 
UNPCKLPD xmm1, xmm2/m128  66 0F 14 /r  Unpack and Interleave Low Packed DoublePrecision FloatingPoint Values 
SSE2 conversion instructions
Instruction  Opcode  Meaning 

CVTDQ2PD xmm1, xmm2/m64  F3 0F E6 /r  Convert Packed Doubleword Integers to Packed DoublePrecision FloatingPoint Values 
CVTDQ2PS xmm1, xmm2/m128  0F 5B /r  Convert Packed Doubleword Integers to Packed SinglePrecision FloatingPoint Values 
CVTPD2DQ xmm1, xmm2/m128  F2 0F E6 /r  Convert Packed DoublePrecision FloatingPoint Values to Packed Doubleword Integers 
CVTPD2PI mm, xmm/m128  66 0F 2D /r  Convert Packed DoublePrecision FP Values to Packed Dword Integers 
CVTPD2PS xmm1, xmm2/m128  66 0F 5A /r  Convert Packed DoublePrecision FloatingPoint Values to Packed SinglePrecision FloatingPoint Values 
CVTPI2PD xmm, mm/m64  66 0F 2A /r  Convert Packed Dword Integers to Packed DoublePrecision FP Values 
CVTPS2DQ xmm1, xmm2/m128  66 0F 5B /r  Convert Packed SinglePrecision FloatingPoint Values to Packed Signed Doubleword Integer Values 
CVTPS2PD xmm1, xmm2/m64  0F 5A /r  Convert Packed SinglePrecision FloatingPoint Values to Packed DoublePrecision FloatingPoint Values 
CVTSD2SI r32, xmm1/m64  F2 0F 2D /r  Convert Scalar DoublePrecision FloatingPoint Value to Doubleword Integer 
CVTSD2SI r64, xmm1/m64  F2 REX.W 0F 2D /r  Convert Scalar DoublePrecision FloatingPoint Value to Quadword Integer With Sign Extension 
CVTSD2SS xmm1, xmm2/m64  F2 0F 5A /r  Convert Scalar DoublePrecision FloatingPoint Value to Scalar SinglePrecision FloatingPoint Value 
CVTSI2SD xmm1, r32/m32  F2 0F 2A /r  Convert Doubleword Integer to Scalar DoublePrecision FloatingPoint Value 
CVTSI2SD xmm1, r/m64  F2 REX.W 0F 2A /r  Convert Quadword Integer to Scalar DoublePrecision FloatingPoint value 
CVTSS2SD xmm1, xmm2/m32  F3 0F 5A /r  Convert Scalar SinglePrecision FloatingPoint Value to Scalar DoublePrecision FloatingPoint Value 
CVTTPD2DQ xmm1, xmm2/m128  66 0F E6 /r  Convert with Truncation Packed DoublePrecision FloatingPoint Values to Packed Doubleword Integers 
CVTTPD2PI mm, xmm/m128  66 0F 2C /r  Convert with Truncation Packed DoublePrecision FP Values to Packed Dword Integers 
CVTTPS2DQ xmm1, xmm2/m128  F3 0F 5B /r  Convert with Truncation Packed SinglePrecision FloatingPoint Values to Packed Signed Doubleword Integer Values 
CVTTSD2SI r32, xmm1/m64  F2 0F 2C /r  Convert with Truncation Scalar DoublePrecision FloatingPoint Value to Signed Dword Integer 
CVTTSD2SI r64, xmm1/m64  F2 REX.W 0F 2C /r  Convert with Truncation Scalar DoublePrecision FloatingPoint Value To Signed Qword Integer 
 CMPSD and MOVSD have the same name as the string instruction mnemonics CMPSD (CMPS) and MOVSD (MOVS); however, the former refer to scalar doubleprecision floatingpoints whereas the latters refer to doubleword strings.
SSE2 SIMD integer instructions
SSE2 MMXlike instructions extended to SSE registers
SSE2 allows execution of MMX instructions on SSE registers, processing twice the amount of data at once.
Instruction  Opcode  Meaning 

MOVD xmm, r/m32  66 0F 6E /r  Move doubleword 
MOVD r/m32, xmm  66 0F 7E /r  Move doubleword 
MOVQ xmm1, xmm2/m64  F3 0F 7E /r  Move quadword 
MOVQ xmm2/m64, xmm1  66 0F D6 /r  Move quadword 
MOVQ r/m64, xmm  66 REX.W 0F 7E /r  Move quadword 
MOVQ xmm, r/m64  66 REX.W 0F 6E /r  Move quadword 
PMOVMSKB reg, xmm  66 0F D7 /r  Move a byte mask, zeroing the upper bits of the register 
PEXTRW reg, xmm, imm8  66 0F C5 /r ib  Extract specified word and move it to reg, setting bits 150 and zeroing the rest 
PINSRW xmm, r32/m16, imm8  66 0F C4 /r ib  Move low word at the specified word position 
PACKSSDW xmm1, xmm2/m128  66 0F 6B /r  Converts 4 packed signed doubleword integers into 8 packed signed word integers with saturation 
PACKSSWB xmm1, xmm2/m128  66 0F 63 /r  Converts 8 packed signed word integers into 16 packed signed byte integers with saturation 
PACKUSWB xmm1, xmm2/m128  66 0F 67 /r  Converts 8 signed word integers into 16 unsigned byte integers with saturation 
PADDB xmm1, xmm2/m128  66 0F FC /r  Add packed byte integers 
PADDW xmm1, xmm2/m128  66 0F FD /r  Add packed word integers 
PADDD xmm1, xmm2/m128  66 0F FE /r  Add packed doubleword integers 
PADDQ xmm1, xmm2/m128  66 0F D4 /r  Add packed quadword integers. 
PADDSB xmm1, xmm2/m128  66 0F EC /r  Add packed signed byte integers with saturation 
PADDSW xmm1, xmm2/m128  66 0F ED /r  Add packed signed word integers with saturation 
PADDUSB xmm1, xmm2/m128  66 0F DC /r  Add packed unsigned byte integers with saturation 
PADDUSW xmm1, xmm2/m128  66 0F DD /r  Add packed unsigned word integers with saturation 
PAND xmm1, xmm2/m128  66 0F DB /r  Bitwise AND 
PANDN xmm1, xmm2/m128  66 0F DF /r  Bitwise AND NOT 
POR xmm1, xmm2/m128  66 0F EB /r  Bitwise OR 
PXOR xmm1, xmm2/m128  66 0F EF /r  Bitwise XOR 
PCMPEQB xmm1, xmm2/m128  66 0F 74 /r  Compare packed bytes for equality. 
PCMPEQW xmm1, xmm2/m128  66 0F 75 /r  Compare packed words for equality. 
PCMPEQD xmm1, xmm2/m128  66 0F 76 /r  Compare packed doublewords for equality. 
PCMPGTB xmm1, xmm2/m128  66 0F 64 /r  Compare packed signed byte integers for greater than 
PCMPGTW xmm1, xmm2/m128  66 0F 65 /r  Compare packed signed word integers for greater than 
PCMPGTD xmm1, xmm2/m128  66 0F 66 /r  Compare packed signed doubleword integers for greater than 
PMULLW xmm1, xmm2/m128  66 0F D5 /r  Multiply packed signed word integers with saturation 
PMULHW xmm1, xmm2/m128  66 0F E5 /r  Multiply the packed signed word integers, store the high 16 bits of the results 
PMULHUW xmm1, xmm2/m128  66 0F E4 /r  Multiply packed unsigned word integers, store the high 16 bits of the results 
PMULUDQ xmm1, xmm2/m128  66 0F F4 /r  Multiply packed unsigned doubleword integers 
PSLLW xmm1, xmm2/m128  66 0F F1 /r  Shift words left while shifting in 0s 
PSLLW xmm1, imm8  66 0F 71 /6 ib  Shift words left while shifting in 0s 
PSLLD xmm1, xmm2/m128  66 0F F2 /r  Shift doublewords left while shifting in 0s 
PSLLD xmm1, imm8  66 0F 72 /6 ib  Shift doublewords left while shifting in 0s 
PSLLQ xmm1, xmm2/m128  66 0F F3 /r  Shift quadwords left while shifting in 0s 
PSLLQ xmm1, imm8  66 0F 73 /6 ib  Shift quadwords left while shifting in 0s 
PSRAD xmm1, xmm2/m128  66 0F E2 /r  Shift doubleword right while shifting in sign bits 
PSRAD xmm1, imm8  66 0F 72 /4 ib  Shift doublewords right while shifting in sign bits 
PSRAW xmm1, xmm2/m128  66 0F E1 /r  Shift words right while shifting in sign bits 
PSRAW xmm1, imm8  66 0F 71 /4 ib  Shift words right while shifting in sign bits 
PSRLW xmm1, xmm2/m128  66 0F D1 /r  Shift words right while shifting in 0s 
PSRLW xmm1, imm8  66 0F 71 /2 ib  Shift words right while shifting in 0s 
PSRLD xmm1, xmm2/m128  66 0F D2 /r  Shift doublewords right while shifting in 0s 
PSRLD xmm1, imm8  66 0F 72 /2 ib  Shift doublewords right while shifting in 0s 
PSRLQ xmm1, xmm2/m128  66 0F D3 /r  Shift quadwords right while shifting in 0s 
PSRLQ xmm1, imm8  66 0F 73 /2 ib  Shift quadwords right while shifting in 0s 
PSUBB xmm1, xmm2/m128  66 0F F8 /r  Subtract packed byte integers 
PSUBW xmm1, xmm2/m128  66 0F F9 /r  Subtract packed word integers 
PSUBD xmm1, xmm2/m128  66 0F FA /r  Subtract packed doubleword integers 
PSUBQ xmm1, xmm2/m128  66 0F FB /r  Subtract packed quadword integers. 
PSUBSB xmm1, xmm2/m128  66 0F E8 /r  Subtract packed signed byte integers with saturation 
PSUBSW xmm1, xmm2/m128  66 0F E9 /r  Subtract packed signed word integers with saturation 
PMADDWD xmm1, xmm2/m128  66 0F F5 /r  Multiply the packed word integers, add adjacent doubleword results 
PSUBUSB xmm1, xmm2/m128  66 0F D8 /r  Subtract packed unsigned byte integers with saturation 
PSUBUSW xmm1, xmm2/m128  66 0F D9 /r  Subtract packed unsigned word integers with saturation 
PUNPCKHBW xmm1, xmm2/m128  66 0F 68 /r  Unpack and interleave highorder bytes 
PUNPCKHWD xmm1, xmm2/m128  66 0F 69 /r  Unpack and interleave highorder words 
PUNPCKHDQ xmm1, xmm2/m128  66 0F 6A /r  Unpack and interleave highorder doublewords 
PUNPCKLBW xmm1, xmm2/m128  66 0F 60 /r  Interleave loworder bytes 
PUNPCKLWD xmm1, xmm2/m128  66 0F 61 /r  Interleave loworder words 
PUNPCKLDQ xmm1, xmm2/m128  66 0F 62 /r  Interleave loworder doublewords 
PAVGB xmm1, xmm2/m128  66 0F E0, /r  Average packed unsigned byte integers with rounding 
PAVGW xmm1, xmm2/m128  66 0F E3 /r  Average packed unsigned word integers with rounding 
PMINUB xmm1, xmm2/m128  66 0F DA /r  Compare packed unsigned byte integers and store packed minimum values 
PMINSW xmm1, xmm2/m128  66 0F EA /r  Compare packed signed word integers and store packed minimum values 
PMAXSW xmm1, xmm2/m128  66 0F EE /r  Compare packed signed word integers and store maximum packed values 
PMAXUB xmm1, xmm2/m128  66 0F DE /r  Compare packed unsigned byte integers and store packed maximum values 
PSADBW xmm1, xmm2/m128  66 0F F6 /r  Computes the absolute differences of the packed unsigned byte integers; the 8 low differences and 8 high differences are then summed separately to produce two unsigned word integer results 
SSE2 integer instructions for SSE registers only
The following instructions can be used only on SSE registers, since by their nature they do not work on MMX registers
Instruction  Opcode  Meaning 

MASKMOVDQU xmm1, xmm2  66 0F F7 /r  NonTemporal Store of Selected Bytes from an XMM Register into Memory 
MOVDQ2Q mm, xmm  F2 0F D6 /r  Move low quadword from XMM to MMX register. 
MOVDQA xmm1, xmm2/m128  66 0F 6F /r  Move aligned double quadword 
MOVDQA xmm2/m128, xmm1  66 0F 7F /r  Move aligned double quadword 
MOVDQU xmm1, xmm2/m128  F3 0F 6F /r  Move unaligned double quadword 
MOVDQU xmm2/m128, xmm1  F3 0F 7F /r  Move unaligned double quadword 
MOVQ2DQ xmm, mm  F3 0F D6 /r  Move quadword from MMX register to low quadword of XMM register 
MOVNTDQ m128, xmm1  66 0F E7 /r  Store Packed Integers Using NonTemporal Hint 
PSHUFHW xmm1, xmm2/m128, imm8  F3 0F 70 /r ib  Shuffle packed high words. 
PSHUFLW xmm1, xmm2/m128, imm8  F2 0F 70 /r ib  Shuffle packed low words. 
PSHUFD xmm1, xmm2/m128, imm8  66 0F 70 /r ib  Shuffle packed doublewords. 
PSLLDQ xmm1, imm8  66 0F 73 /7 ib  Packed shift left logical double quadwords. 
PSRLDQ xmm1, imm8  66 0F 73 /3 ib  Packed shift right logical double quadwords. 
PUNPCKHQDQ xmm1, xmm2/m128  66 0F 6D /r  Unpack and interleave highorder quadwords, 
PUNPCKLQDQ xmm1, xmm2/m128  66 0F 6C /r  Interleave low quadwords, 
SSE3 instructions
Added with Pentium 4 supporting SSE3
SSE3 SIMD floatingpoint instructions
Instruction  Opcode  Meaning  Notes 

ADDSUBPS xmm1, xmm2/m128  F2 0F D0 /r  Add/subtract singleprecision floatingpoint values  for Complex Arithmetic 
ADDSUBPD xmm1, xmm2/m128  66 0F D0 /r  Add/subtract doubleprecision floatingpoint values  
MOVDDUP xmm1, xmm2/m64  F2 0F 12 /r  Move doubleprecision floatingpoint value and duplicate  
MOVSLDUP xmm1, xmm2/m128  F3 0F 12 /r  Move and duplicate even index singleprecision floatingpoint values  
MOVSHDUP xmm1, xmm2/m128  F3 0F 16 /r  Move and duplicate odd index singleprecision floatingpoint values  
HADDPS xmm1, xmm2/m128  F2 0F 7C /r  Horizontal add packed singleprecision floatingpoint values  for Graphics 
HADDPD xmm1, xmm2/m128  66 0F 7C /r  Horizontal add packed doubleprecision floatingpoint values  
HSUBPS xmm1, xmm2/m128  F2 0F 7D /r  Horizontal subtract packed singleprecision floatingpoint values  
HSUBPD xmm1, xmm2/m128  66 0F 7D /r  Horizontal subtract packed doubleprecision floatingpoint values 
SSE3 SIMD integer instructions
Instruction  Opcode  Meaning  Notes 

LDDQU xmm1, mem  F2 0F F0 /r  Load unaligned data and return double quadword  Instructionally equivalent to MOVDQU. For video encoding 
SSSE3 instructions
Added with Xeon 5100 series and initial Core 2
The following MMXlike instructions extended to SSE registers were added with SSSE3
Instruction  Opcode  Meaning 

PSIGNB xmm1, xmm2/m128  66 0F 38 08 /r  Negate/zero/preserve packed byte integers depending on corresponding sign 
PSIGNW xmm1, xmm2/m128  66 0F 38 09 /r  Negate/zero/preserve packed word integers depending on corresponding sign 
PSIGND xmm1, xmm2/m128  66 0F 38 0A /r  Negate/zero/preserve packed doubleword integers depending on corresponding 
PSHUFB xmm1, xmm2/m128  66 0F 38 00 /r  Shuffle bytes 
PMULHRSW xmm1, xmm2/m128  66 0F 38 0B /r  Multiply 16bit signed words, scale and round signed doublewords, pack high 16 bits 
PMADDUBSW xmm1, xmm2/m128  66 0F 38 04 /r  Multiply signed and unsigned bytes, add horizontal pair of signed words, pack saturated signedwords 
PHSUBW xmm1, xmm2/m128  66 0F 38 05 /r  Subtract and pack 16bit signed integers horizontally 
PHSUBSW xmm1, xmm2/m128  66 0F 38 07 /r  Subtract and pack 16bit signed integer horizontally with saturation 
PHSUBD xmm1, xmm2/m128  66 0F 38 06 /r  Subtract and pack 32bit signed integers horizontally 
PHADDSW xmm1, xmm2/m128  66 0F 38 03 /r  Add and pack 16bit signed integers horizontally with saturation 
PHADDW xmm1, xmm2/m128  66 0F 38 01 /r  Add and pack 16bit integers horizontally 
PHADDD xmm1, xmm2/m128  66 0F 38 02 /r  Add and pack 32bit integers horizontally 
PALIGNR xmm1, xmm2/m128, imm8  66 0F 3A 0F /r ib  Concatenate destination and source operands, extract bytealigned result shifted to the right 
PABSB xmm1, xmm2/m128  66 0F 38 1C /r  Compute the absolute value of bytes and store unsigned result 
PABSW xmm1, xmm2/m128  66 0F 38 1D /r  Compute the absolute value of 16bit integers and store unsigned result 
PABSD xmm1, xmm2/m128  66 0F 38 1E /r  Compute the absolute value of 32bit integers and store unsigned result 
SSE4 instructions
SSE4.1
Added with Core 2 manufactured in 45nm
SSE4.1 SIMD floatingpoint instructions
Instruction  Opcode  Meaning 

DPPS xmm1, xmm2/m128, imm8  66 0F 3A 40 /r ib  Selectively multiply packed SP floatingpoint values, add and selectively store 
DPPD xmm1, xmm2/m128, imm8  66 0F 3A 41 /r ib  Selectively multiply packed DP floatingpoint values, add and selectively store 
BLENDPS xmm1, xmm2/m128, imm8  66 0F 3A 0C /r ib  Select packed single precision floatingpoint values from specified mask 
BLENDVPS xmm1, xmm2/m128, <XMM0>  66 0F 38 14 /r  Select packed single precision floatingpoint values from specified mask 
BLENDPD xmm1, xmm2/m128, imm8  66 0F 3A 0D /r ib  Select packed DPFP values from specified mask 
BLENDVPD xmm1, xmm2/m128 , <XMM0>  66 0F 38 15 /r  Select packed DP FP values from specified mask 
ROUNDPS xmm1, xmm2/m128, imm8  66 0F 3A 08 /r ib  Round packed single precision floatingpoint values 
ROUNDSS xmm1, xmm2/m32, imm8  66 0F 3A 0A /r ib  Round the low packed single precision floatingpoint value 
ROUNDPD xmm1, xmm2/m128, imm8  66 0F 3A 09 /r ib  Round packed double precision floatingpoint values 
ROUNDSD xmm1, xmm2/m64, imm8  66 0F 3A 0B /r ib  Round the low packed double precision floatingpoint value 
INSERTPS xmm1, xmm2/m32, imm8  66 0F 3A 21 /r ib  Insert a selected singleprecision floatingpoint value at the specified destination element and zero out destination elements 
EXTRACTPS reg/m32, xmm1, imm8  66 0F 3A 17 /r ib  Extract one singleprecision floatingpoint value at specified offset and store the result (zeroextended, if applicable) 
SSE4.1 SIMD integer instructions
Instruction  Opcode  Meaning 

MPSADBW xmm1, xmm2/m128, imm8  66 0F 3A 42 /r ib  Sums absolute 8bit integer difference of adjacent groups of 4 byte integers with starting offset 
PHMINPOSUW xmm1, xmm2/m128  66 0F 38 41 /r  Find the minimum unsigned word 
PMULLD xmm1, xmm2/m128  66 0F 38 40 /r  Multiply the packed dword signed integers and store the low 32 bits 
PMULDQ xmm1, xmm2/m128  66 0F 38 28 /r  Multiply packed signed doubleword integers and store quadword result 
PBLENDVB xmm1, xmm2/m128, <XMM0>  66 0F 38 10 /r  Select byte values from specified mask 
PBLENDW xmm1, xmm2/m128, imm8  66 0F 3A 0E /r ib  Select words from specified mask 
PMINSB xmm1, xmm2/m128  66 0F 38 38 /r  Compare packed signed byte integers 
PMINUW xmm1, xmm2/m128  66 0F 38 3A/r  Compare packed unsigned word integers 
PMINSD xmm1, xmm2/m128  66 0F 38 39 /r  Compare packed signed dword integers 
PMINUD xmm1, xmm2/m128  66 0F 38 3B /r  Compare packed unsigned dword integers 
PMAXSB xmm1, xmm2/m128  66 0F 38 3C /r  Compare packed signed byte integers 
PMAXUW xmm1, xmm2/m128  66 0F 38 3E/r  Compare packed unsigned word integers 
PMAXSD xmm1, xmm2/m128  66 0F 38 3D /r  Compare packed signed dword integers 
PMAXUD xmm1, xmm2/m128  66 0F 38 3F /r  Compare packed unsigned dword integers 
PINSRB xmm1, r32/m8, imm8  66 0F 3A 20 /r ib  Insert a byte integer value at specified destination element 
PINSRD xmm1, r/m32, imm8  66 0F 3A 22 /r ib  Insert a dword integer value at specified destination element 
PINSRQ xmm1, r/m64, imm8  66 REX.W 0F 3A 22 /r ib  Insert a qword integer value at specified destination element 
PEXTRB reg/m8, xmm2, imm8  66 0F 3A 14 /r ib  Extract a byte integer value at source byte offset, upper bits are zeroed. 
PEXTRW reg/m16, xmm, imm8  66 0F 3A 15 /r ib  Extract word and copy to lowest 16 bits, zeroextended 
PEXTRD r/m32, xmm2, imm8  66 0F 3A 16 /r ib  Extract a dword integer value at source dword offset 
PEXTRQ r/m64, xmm2, imm8  66 REX.W 0F 3A 16 /r ib  Extract a qword integer value at source qword offset 
PMOVSXBW xmm1, xmm2/m64  66 0f 38 20 /r  Sign extend 8 packed 8bit integers to 8 packed 16bit integers 
PMOVZXBW xmm1, xmm2/m64  66 0f 38 30 /r  Zero extend 8 packed 8bit integers to 8 packed 16bit integers 
PMOVSXBD xmm1, xmm2/m32  66 0f 38 21 /r  Sign extend 4 packed 8bit integers to 4 packed 32bit integers 
PMOVZXBD xmm1, xmm2/m32  66 0f 38 31 /r  Zero extend 4 packed 8bit integers to 4 packed 32bit integers 
PMOVSXBQ xmm1, xmm2/m16  66 0f 38 22 /r  Sign extend 2 packed 8bit integers to 2 packed 64bit integers 
PMOVZXBQ xmm1, xmm2/m16  66 0f 38 32 /r  Zero extend 2 packed 8bit integers to 2 packed 64bit integers 
PMOVSXWD xmm1, xmm2/m64  66 0f 38 23/r  Sign extend 4 packed 16bit integers to 4 packed 32bit integers 
PMOVZXWD xmm1, xmm2/m64  66 0f 38 33 /r  Zero extend 4 packed 16bit integers to 4 packed 32bit integers 
PMOVSXWQ xmm1, xmm2/m32  66 0f 38 24 /r  Sign extend 2 packed 16bit integers to 2 packed 64bit integers 
PMOVZXWQ xmm1, xmm2/m32  66 0f 38 34 /r  Zero extend 2 packed 16bit integers to 2 packed 64bit integers 
PMOVSXDQ xmm1, xmm2/m64  66 0f 38 25 /r  Sign extend 2 packed 32bit integers to 2 packed 64bit integers 
PMOVZXDQ xmm1, xmm2/m64  66 0f 38 35 /r  Zero extend 2 packed 32bit integers to 2 packed 64bit integers 
PTEST xmm1, xmm2/m128  66 0F 38 17 /r  Set ZF if AND result is all 0s, set CF if AND NOT result is all 0s 
PCMPEQQ xmm1, xmm2/m128  66 0F 38 29 /r  Compare packed qwords for equality 
PACKUSDW xmm1, xmm2/m128  66 0F 38 2B /r  Convert 2 × 4 packed signed doubleword integers into 8 packed unsigned word integers with saturation 
MOVNTDQA xmm1, m128  66 0F 38 2A /r  Move double quadword using nontemporal hint if WC memory type 
SSE4a
Added with Phenom processors
 EXTRQ/INSERTQ
 MOVNTSD/MOVNTSS
SSE4.2
Added with Nehalem processors
Instruction  Opcode  Meaning 

PCMPESTRI xmm1, xmm2/m128, imm8  66 0F 3A 61 /r imm8  Packed comparison of string data with explicit lengths, generating an index 
PCMPESTRM xmm1, xmm2/m128, imm8  66 0F 3A 60 /r imm8  Packed comparison of string data with explicit lengths, generating a mask 
PCMPISTRI xmm1, xmm2/m128, imm8  66 0F 3A 63 /r imm8  Packed comparison of string data with implicit lengths, generating an index 
PCMPISTRM xmm1, xmm2/m128, imm8  66 0F 3A 62 /r imm8  Packed comparison of string data with implicit lengths, generating a mask 
PCMPGTQ xmm1,xmm2/m128  66 0F 38 37 /r  Compare packed signed qwords for greater than. 
SSE5 derived instructions
SSE5 was a proposed SSE extension by AMD. The bundle did not include the full set of Intel's SSE4 instructions, making it a competitor to SSE4 rather than a successor. AMD chose not to implement SSE5 as originally proposed, however, derived SSE extensions were introduced.
XOP
Introduced with the bulldozer processor core, removed again from Zen (microarchitecture) onward.
A revision of most of the SSE5 instruction set
F16C
Halfprecision floatingpoint conversion.
Instruction  Meaning 

VCVTPH2PS xmmreg,xmmrm64  Convert four halfprecision floating point values in memory or the bottom half of an XMM register to four singleprecision floatingpoint values in an XMM register 
VCVTPH2PS ymmreg,xmmrm128  Convert eight halfprecision floating point values in memory or an XMM register (the bottom half of a YMM register) to eight singleprecision floatingpoint values in a YMM register 
VCVTPS2PH xmmrm64,xmmreg,imm8  Convert four singleprecision floating point values in an XMM register to halfprecision floatingpoint values in memory or the bottom half an XMM register 
VCVTPS2PH xmmrm128,ymmreg,imm8  Convert eight singleprecision floating point values in a YMM register to halfprecision floatingpoint values in memory or an XMM register 
FMA3
Supported in AMD processors starting with the Piledriver architecture and Intel starting with Haswell processors and Broadwell processors since 2014.
Fused multiplyadd (floatingpoint vector multiply–accumulate) with three operands.
Instruction  Meaning 

VFMADD132PD  Fused MultiplyAdd of Packed DoublePrecision FloatingPoint Values 
VFMADD213PD  Fused MultiplyAdd of Packed DoublePrecision FloatingPoint Values 
VFMADD231PD  Fused MultiplyAdd of Packed DoublePrecision FloatingPoint Values 
VFMADD132PS  Fused MultiplyAdd of Packed SinglePrecision FloatingPoint Values 
VFMADD213PS  Fused MultiplyAdd of Packed SinglePrecision FloatingPoint Values 
VFMADD231PS  Fused MultiplyAdd of Packed SinglePrecision FloatingPoint Values 
VFMADD132SD  Fused MultiplyAdd of Scalar DoublePrecision FloatingPoint Values 
VFMADD213SD  Fused MultiplyAdd of Scalar DoublePrecision FloatingPoint Values 
VFMADD231SD  Fused MultiplyAdd of Scalar DoublePrecision FloatingPoint Values 
VFMADD132SS  Fused MultiplyAdd of Scalar SinglePrecision FloatingPoint Values 
VFMADD213SS  Fused MultiplyAdd of Scalar SinglePrecision FloatingPoint Values 
VFMADD231SS  Fused MultiplyAdd of Scalar SinglePrecision FloatingPoint Values 
VFMADDSUB132PD  Fused MultiplyAlternating Add/Subtract of Packed DoublePrecision FloatingPoint Values 
VFMADDSUB213PD  Fused MultiplyAlternating Add/Subtract of Packed DoublePrecision FloatingPoint Values 
VFMADDSUB231PD  Fused MultiplyAlternating Add/Subtract of Packed DoublePrecision FloatingPoint Values 
VFMADDSUB132PS  Fused MultiplyAlternating Add/Subtract of Packed SinglePrecision FloatingPoint Values 
VFMADDSUB213PS  Fused MultiplyAlternating Add/Subtract of Packed SinglePrecision FloatingPoint Values 
VFMADDSUB231PS  Fused MultiplyAlternating Add/Subtract of Packed SinglePrecision FloatingPoint Values 
VFMSUB132PD  Fused MultiplySubtract of Packed DoublePrecision FloatingPoint Values 
VFMSUB213PD  Fused MultiplySubtract of Packed DoublePrecision FloatingPoint Values 
VFMSUB231PD  Fused MultiplySubtract of Packed DoublePrecision FloatingPoint Values 
VFMSUB132PS  Fused MultiplySubtract of Packed SinglePrecision FloatingPoint Values 
VFMSUB213PS  Fused MultiplySubtract of Packed SinglePrecision FloatingPoint Values 
VFMSUB231PS  Fused MultiplySubtract of Packed SinglePrecision FloatingPoint Values 
VFMSUB132SD  Fused MultiplySubtract of Scalar DoublePrecision FloatingPoint Values 
VFMSUB213SD  Fused MultiplySubtract of Scalar DoublePrecision FloatingPoint Values 
VFMSUB231SD  Fused MultiplySubtract of Scalar DoublePrecision FloatingPoint Values 
VFMSUB132SS  Fused MultiplySubtract of Scalar SinglePrecision FloatingPoint Values 
VFMSUB213SS  Fused MultiplySubtract of Scalar SinglePrecision FloatingPoint Values 
VFMSUB231SS  Fused MultiplySubtract of Scalar SinglePrecision FloatingPoint Values 
VFMSUBADD132PD  Fused MultiplyAlternating Subtract/Add of Packed DoublePrecision FloatingPoint Values 
VFMSUBADD213PD  Fused MultiplyAlternating Subtract/Add of Packed DoublePrecision FloatingPoint Values 
VFMSUBADD231PD  Fused MultiplyAlternating Subtract/Add of Packed DoublePrecision FloatingPoint Values 
VFMSUBADD132PS  Fused MultiplyAlternating Subtract/Add of Packed SinglePrecision FloatingPoint Values 
VFMSUBADD213PS  Fused MultiplyAlternating Subtract/Add of Packed SinglePrecision FloatingPoint Values 
VFMSUBADD231PS  Fused MultiplyAlternating Subtract/Add of Packed SinglePrecision FloatingPoint Values 
VFNMADD132PD  Fused Negative MultiplyAdd of Packed DoublePrecision FloatingPoint Values 
VFNMADD213PD  Fused Negative MultiplyAdd of Packed DoublePrecision FloatingPoint Values 
VFNMADD231PD  Fused Negative MultiplyAdd of Packed DoublePrecision FloatingPoint Values 
VFNMADD132PS  Fused Negative MultiplyAdd of Packed SinglePrecision FloatingPoint Values 
VFNMADD213PS  Fused Negative MultiplyAdd of Packed SinglePrecision FloatingPoint Values 
VFNMADD231PS  Fused Negative MultiplyAdd of Packed SinglePrecision FloatingPoint Values 
VFNMADD132SD  Fused Negative MultiplyAdd of Scalar DoublePrecision FloatingPoint Values 
VFNMADD213SD  Fused Negative MultiplyAdd of Scalar DoublePrecision FloatingPoint Values 
VFNMADD231SD  Fused Negative MultiplyAdd of Scalar DoublePrecision FloatingPoint Values 
VFNMADD132SS  Fused Negative MultiplyAdd of Scalar SinglePrecision FloatingPoint Values 
VFNMADD213SS  Fused Negative MultiplyAdd of Scalar SinglePrecision FloatingPoint Values 
VFNMADD231SS  Fused Negative MultiplyAdd of Scalar SinglePrecision FloatingPoint Values 
VFNMSUB132PD  Fused Negative MultiplySubtract of Packed DoublePrecision FloatingPoint Values 
VFNMSUB213PD  Fused Negative MultiplySubtract of Packed DoublePrecision FloatingPoint Values 
VFNMSUB231PD  Fused Negative MultiplySubtract of Packed DoublePrecision FloatingPoint Values 
VFNMSUB132PS  Fused Negative MultiplySubtract of Packed SinglePrecision FloatingPoint Values 
VFNMSUB213PS  Fused Negative MultiplySubtract of Packed SinglePrecision FloatingPoint Values 
VFNMSUB231PS  Fused Negative MultiplySubtract of Packed SinglePrecision FloatingPoint Values 
VFNMSUB132SD  Fused Negative MultiplySubtract of Scalar DoublePrecision FloatingPoint Values 
VFNMSUB213SD  Fused Negative MultiplySubtract of Scalar DoublePrecision FloatingPoint Values 
VFNMSUB231SD  Fused Negative MultiplySubtract of Scalar DoublePrecision FloatingPoint Values 
VFNMSUB132SS  Fused Negative MultiplySubtract of Scalar SinglePrecision FloatingPoint Values 
VFNMSUB213SS  Fused Negative MultiplySubtract of Scalar SinglePrecision FloatingPoint Values 
VFNMSUB231SS  Fused Negative MultiplySubtract of Scalar SinglePrecision FloatingPoint Values 
FMA4
Supported in AMD processors starting with the Bulldozer architecture. Not supported by any intel chip as of 2017.
Fused multiplyadd with four operands. FMA4 was realized in hardware before FMA3.
Instruction  Opcode  Meaning  Notes 

VFMADDPD xmm0, xmm1, xmm2, xmm3  C4E3 WvvvvL01 69 /r /is4  Fused MultiplyAdd of Packed DoublePrecision FloatingPoint Values  
VFMADDPS xmm0, xmm1, xmm2, xmm3  C4E3 WvvvvL01 68 /r /is4  Fused MultiplyAdd of Packed SinglePrecision FloatingPoint Values  
VFMADDSD xmm0, xmm1, xmm2, xmm3  C4E3 WvvvvL01 6B /r /is4  Fused MultiplyAdd of Scalar DoublePrecision FloatingPoint Values  
VFMADDSS xmm0, xmm1, xmm2, xmm3  C4E3 WvvvvL01 6A /r /is4  Fused MultiplyAdd of Scalar SinglePrecision FloatingPoint Values  
VFMADDSUBPD xmm0, xmm1, xmm2, xmm3  C4E3 WvvvvL01 5D /r /is4  Fused MultiplyAlternating Add/Subtract of Packed DoublePrecision FloatingPoint Values  
VFMADDSUBPS xmm0, xmm1, xmm2, xmm3  C4E3 WvvvvL01 5C /r /is4  Fused MultiplyAlternating Add/Subtract of Packed SinglePrecision FloatingPoint Values  
VFMSUBADDPD xmm0, xmm1, xmm2, xmm3  C4E3 WvvvvL01 5F /r /is4  Fused MultiplyAlternating Subtract/Add of Packed DoublePrecision FloatingPoint Values  
VFMSUBADDPS xmm0, xmm1, xmm2, xmm3  C4E3 WvvvvL01 5E /r /is4  Fused MultiplyAlternating Subtract/Add of Packed SinglePrecision FloatingPoint Values  
VFMSUBPD xmm0, xmm1, xmm2, xmm3  C4E3 WvvvvL01 6D /r /is4  Fused MultiplySubtract of Packed DoublePrecision FloatingPoint Values  
VFMSUBPS xmm0, xmm1, xmm2, xmm3  C4E3 WvvvvL01 6C /r /is4  Fused MultiplySubtract of Packed SinglePrecision FloatingPoint Values  
VFMSUBSD xmm0, xmm1, xmm2, xmm3  C4E3 WvvvvL01 6F /r /is4  Fused MultiplySubtract of Scalar DoublePrecision FloatingPoint Values  
VFMSUBSS xmm0, xmm1, xmm2, xmm3  C4E3 WvvvvL01 6E /r /is4  Fused MultiplySubtract of Scalar SinglePrecision FloatingPoint Values  
VFNMADDPD xmm0, xmm1, xmm2, xmm3  C4E3 WvvvvL01 79 /r /is4  Fused Negative MultiplyAdd of Packed DoublePrecision FloatingPoint Values  
VFNMADDPS xmm0, xmm1, xmm2, xmm3  C4E3 WvvvvL01 78 /r /is4  Fused Negative MultiplyAdd of Packed SinglePrecision FloatingPoint Values  
VFNMADDSD xmm0, xmm1, xmm2, xmm3  C4E3 WvvvvL01 7B /r /is4  Fused Negative MultiplyAdd of Scalar DoublePrecision FloatingPoint Values  
VFNMADDSS xmm0, xmm1, xmm2, xmm3  C4E3 WvvvvL01 7A /r /is4  Fused Negative MultiplyAdd of Scalar SinglePrecision FloatingPoint Values  
VFNMSUBPD xmm0, xmm1, xmm2, xmm3  C4E3 WvvvvL01 7D /r /is4  Fused Negative MultiplySubtract of Packed DoublePrecision FloatingPoint Values  
VFNMSUBPS xmm0, xmm1, xmm2, xmm3  C4E3 WvvvvL01 7C /r /is4  Fused Negative MultiplySubtract of Packed SinglePrecision FloatingPoint Values  
VFNMSUBSD xmm0, xmm1, xmm2, xmm3  C4E3 WvvvvL01 7F /r /is4  Fused Negative MultiplySubtract of Scalar DoublePrecision FloatingPoint Values  
VFNMSUBSS xmm0, xmm1, xmm2, xmm3  C4E3 WvvvvL01 7E /r /is4  Fused Negative MultiplySubtract of Scalar SinglePrecision FloatingPoint Values 
AVX
AVX were first supported by Intel with Sandy Bridge and by AMD with Bulldozer.
Vector operations on 256 bit registers.
Instruction  Description 

VBROADCASTSS  Copy a 32bit, 64bit or 128bit memory operand to all elements of a XMM or YMM vector register. 
VBROADCASTSD  
VBROADCASTF128  
VINSERTF128  Replaces either the lower half or the upper half of a 256bit YMM register with the value of a 128bit source operand. The other half of the destination is unchanged. 
VEXTRACTF128  Extracts either the lower half or the upper half of a 256bit YMM register and copies the value to a 128bit destination operand. 
VMASKMOVPS  Conditionally reads any number of elements from a SIMD vector memory operand into a destination register, leaving the remaining vector elements unread and setting the corresponding elements in the destination register to zero. Alternatively, conditionally writes any number of elements from a SIMD vector register operand to a vector memory operand, leaving the remaining elements of the memory operand unchanged. On the AMD Jaguar processor architecture, this instruction with a memory source operand takes more than 300 clock cycles when the mask is zero, in which case the instruction should do nothing. This appears to be a design flaw.^{[12]} 
VMASKMOVPD  
VPERMILPS  Permute InLane. Shuffle the 32bit or 64bit vector elements of one input operand. These are inlane 256bit instructions, meaning that they operate on all 256 bits with two separate 128bit shuffles, so they can not shuffle across the 128bit lanes.^{[13]} 
VPERMILPD  
VPERM2F128  Shuffle the four 128bit vector elements of two 256bit source operands into a 256bit destination operand, with an immediate constant as selector. 
VZEROALL  Set all YMM registers to zero and tag them as unused. Used when switching between 128bit use and 256bit use. 
VZEROUPPER  Set the upper half of all YMM registers to zero. Used when switching between 128bit use and 256bit use. 
AVX2
Introduced in Intel's Haswell microarchitecture and AMD's Excavator.
Expansion of most vector integer SSE and AVX instructions to 256 bits
Instruction  Description 

VBROADCASTSS  Copy a 32bit or 64bit register operand to all elements of a XMM or YMM vector register. These are register versions of the same instructions in AVX1. There is no 128bit version however, but the same effect can be simply achieved using VINSERTF128. 
VBROADCASTSD  
VPBROADCASTB  Copy an 8, 16, 32 or 64bit integer register or memory operand to all elements of a XMM or YMM vector register. 
VPBROADCASTW  
VPBROADCASTD  
VPBROADCASTQ  
VBROADCASTI128  Copy a 128bit memory operand to all elements of a YMM vector register. 
VINSERTI128  Replaces either the lower half or the upper half of a 256bit YMM register with the value of a 128bit source operand. The other half of the destination is unchanged. 
VEXTRACTI128  Extracts either the lower half or the upper half of a 256bit YMM register and copies the value to a 128bit destination operand. 
VGATHERDPD  Gathers single or double precision floating point values using either 32 or 64bit indices and scale. 
VGATHERQPD  
VGATHERDPS  
VGATHERQPS  
VPGATHERDD  Gathers 32 or 64bit integer values using either 32 or 64bit indices and scale. 
VPGATHERDQ  
VPGATHERQD  
VPGATHERQQ  
VPMASKMOVD  Conditionally reads any number of elements from a SIMD vector memory operand into a destination register, leaving the remaining vector elements unread and setting the corresponding elements in the destination register to zero. Alternatively, conditionally writes any number of elements from a SIMD vector register operand to a vector memory operand, leaving the remaining elements of the memory operand unchanged. 
VPMASKMOVQ  
VPERMPS  Shuffle the eight 32bit vector elements of one 256bit source operand into a 256bit destination operand, with a register or memory operand as selector. 
VPERMD  
VPERMPD  Shuffle the four 64bit vector elements of one 256bit source operand into a 256bit destination operand, with a register or memory operand as selector. 
VPERMQ  
VPERM2I128  Shuffle (two of) the four 128bit vector elements of two 256bit source operands into a 256bit destination operand, with an immediate constant as selector. 
VPBLENDD  Doubleword immediate version of the PBLEND instructions from SSE4. 
VPSLLVD  Shift left logical. Allows variable shifts where each element is shifted according to the packed input. 
VPSLLVQ  
VPSRLVD  Shift right logical. Allows variable shifts where each element is shifted according to the packed input. 
VPSRLVQ  
VPSRAVD  Shift right arithmetically. Allows variable shifts where each element is shifted according to the packed input. 
AVX512
Introduced in Intel's Xeon Phi x200
Vector operations on 512 bit registers.
AVX512 foundation
Instruction  Description 

VBLENDMPD  Blend float64 vectors using opmask control 
VBLENDMPS  Blend float32 vectors using opmask control 
VPBLENDMD  Blend int32 vectors using opmask control 
VPBLENDMQ  Blend int64 vectors using opmask control 
VPCMPD  Compare signed/unsigned doublewords into mask 
VPCMPUD  
VPCMPQ  Compare signed/unsigned quadwords into mask 
VPCMPUQ  
VPTESTMD  Logical AND and set mask for 32 or 64 bit integers. 
VPTESTMQ  
VPTESTNMD  Logical NAND and set mask for 32 or 64 bit integers. 
VPTESTNMQ  
VCOMPRESSPD  Store sparse packed double/singleprecision floatingpoint values into dense memory 
VCOMPRESSPS  
VPCOMPRESSD  Store sparse packed doubleword/quadword integer values into dense memory/register 
VPCOMPRESSQ  
VEXPANDPD  Load sparse packed double/singleprecision floatingpoint values from dense memory 
VEXPANDPS  
VPEXPANDD  Load sparse packed doubleword/quadword integer values from dense memory/register 
VPEXPANDQ  
VPERMI2PD  Full single/double floating point permute overwriting the index. 
VPERMI2PS  
VPERMI2D  Full doubleword/quadword permute overwriting the index. 
VPERMI2Q  
VPERMT2PS  Full single/double floating point permute overwriting first source. 
VPERMT2PD  
VPERMT2D  Full doubleword/quadword permute overwriting first source. 
VPERMT2Q  
VSHUFF32x4  Shuffle four packed 128bit lines. 
VSHUFF64x2  
VSHUFFI32x4  
VSHUFFI64x2  
VPTERNLOGD  Bitwise Ternary Logic 
VPTERNLOGQ  
VPMOVQD  Down convert quadword or doubleword to doubleword, word or byte; unsaturated, saturated or saturated unsigned. The reverse of the sign/zero extend instructions from SSE4.1. 
VPMOVSQD  
VPMOVUSQD  
VPMOVQW  
VPMOVSQW  
VPMOVUSQW  
VPMOVQB  
VPMOVSQB  
VPMOVUSQB  
VPMOVDW  
VPMOVSDW  
VPMOVUSDW  
VPMOVDB  
VPMOVSDB  
VPMOVUSDB  
VCVTPS2UDQ  Convert with or without truncation, packed single or doubleprecision floating point to packed unsigned doubleword integers. 
VCVTPD2UDQ  
VCVTTPS2UDQ  
VCVTTPD2UDQ  
VCVTSS2USI  Convert with or without trunction, scalar single or doubleprecision floating point to unsigned doubleword integer. 
VCVTSD2USI  
VCVTTSS2USI  
VCVTTSD2USI  
VCVTUDQ2PS  Convert packed unsigned doubleword integers to packed single or doubleprecision floating point. 
VCVTUDQ2PD  
VCVTUSI2PS  Convert scalar unsigned doubleword integers to single or doubleprecision floating point. 
VCVTUSI2PD  
VCVTUSI2SD  Convert scalar unsigned integers to single or doubleprecision floating point. 
VCVTUSI2SS  
VCVTQQ2PD  Convert packed quadword integers to packed single or doubleprecision floating point. 
VCVTQQ2PS  
VGETEXPPD  Convert exponents of packed fp values into fp values 
VGETEXPPS  
VGETEXPSD  Convert exponent of scalar fp value into fp value 
VGETEXPSS  
VGETMANTPD  Extract vector of normalized mantissas from float32/float64 vector 
VGETMANTPS  
VGETMANTSD  Extract float32/float64 of normalized mantissa from float32/float64 scalar 
VGETMANTSS  
VFIXUPIMMPD  Fix up special packed float32/float64 values 
VFIXUPIMMPS  
VFIXUPIMMSD  Fix up special scalar float32/float64 value 
VFIXUPIMMSS  
VRCP14PD  Compute approximate reciprocals of packed float32/float64 values 
VRCP14PS  
VRCP14SD  Compute approximate reciprocals of scalar float32/float64 value 
VRCP14SS  
VRNDSCALEPS  Round packed float32/float64 values to include a given number of fraction bits 
VRNDSCALEPD  
VRNDSCALESS  Round scalar float32/float64 value to include a given number of fraction bits 
VRNDSCALESD  
VRSQRT14PD  Compute approximate reciprocals of square roots of packed float32/float64 values 
VRSQRT14PS  
VRSQRT14SD  Compute approximate reciprocal of square root of scalar float32/float64 value 
VRSQRT14SS  
VSCALEFPS  Scale packed float32/float64 values with float32/float64 values 
VSCALEFPD  
VSCALEFSS  Scale scalar float32/float64 value with float32/float64 value 
VSCALEFSD  
VALIGND  Align doubleword or quadword vectors 
VALIGNQ  
VPABSQ  Packed absolute value quadword 
VPMAXSQ  Maximum of packed signed/unsigned quadword 
VPMAXUQ  
VPMINSQ  Minimum of packed signed/unsigned quadword 
VPMINUQ  
VPROLD  Bit rotate left or right 
VPROLVD  
VPROLQ  
VPROLVQ  
VPRORD  
VPRORVD  
VPRORQ  
VPRORVQ  
VPSCATTERDD  Scatter packed doubleword/quadword with signed doubleword and quadword indices 
VPSCATTERDQ  
VPSCATTERQD  
VPSCATTERQQ  
VSCATTERDPS  Scatter packed float32/float64 with signed doubleword and quadword indices 
VSCATTERDPD  
VSCATTERQPS  
VSCATTERQPD 
Cryptographic instructions
Intel AES instructions
6 new instructions.
Instruction  Description 

AESENC  Perform one round of an AES encryption flow 
AESENCLAST  Perform the last round of an AES encryption flow 
AESDEC  Perform one round of an AES decryption flow 
AESDECLAST  Perform the last round of an AES decryption flow 
AESKEYGENASSIST  Assist in AES round key generation 
AESIMC  Assist in AES Inverse Mix Columns 
RDRAND and RDSEED
Instruction  Description 

RDRAND  Read Random Number 
RDSEED  Read Random Seed 
Intel SHA instructions
7 new instructions.
Instruction  Description 

SHA1RNDS4  Perform Four Rounds of SHA1 Operation 
SHA1NEXTE  Calculate SHA1 State Variable E after Four Rounds 
SHA1MSG1  Perform an Intermediate Calculation for the Next Four SHA1 Message Dwords 
SHA1MSG2  Perform a Final Calculation for the Next Four SHA1 Message Dwords 
SHA256RNDS2  Perform Two Rounds of SHA256 Operation 
SHA256MSG1  Perform an Intermediate Calculation for the Next Four SHA256 Message Dwords 
SHA256MSG2  Perform a Final Calculation for the Next Four SHA256 Message Dwords 
Undocumented instructions
Undocumented x86 instructions
The x86 CPUs contain undocumented instructions which are implemented on the chips but not listed in some official documents. They can be found in various sources across the Internet, such as Ralf Brown's Interrupt List and at sandpile.org
Mnemonic  Opcode  Description  Status 

AAM imm8  D4 imm8  Divide AL by imm8, put the quotient in AH, and the remainder in AL  Available beginning with 8086, documented since Pentium (earlier documentation lists no arguments) 
AAD imm8  D5 imm8  Multiplication counterpart of AAM  Available beginning with 8086, documented since Pentium (earlier documentation lists no arguments) 
SALC  D6  Set AL depending on the value of the Carry Flag (a 1byte alternative of SBB AL, AL)  Available beginning with 8086, but only documented since Pentium Pro. 
ICEBP  F1  Single byte singlestep exception / Invoke ICE  Available beginning with 80386, documented (as INT1) since Pentium Pro 
Unknown mnemonic  0F 04  Exact purpose unknown, causes CPU hang (HCF). The only way out is CPU reset.^{[14]}
In some implementations, emulated through BIOS as a halting sequence.^{[15]} In a forum post at the Vintage Computing Federation, this instruction is explained as SAVEALL. It interacts with ICE mode. 
Only available on 80286 
LOADALL  0F 05  Loads All Registers from Memory Address 0x000800H  Only available on 80286 
LOADALLD  0F 07  Loads All Registers from Memory Address ES:EDI  Only available on 80386 
UD1  0F B9  Intentionally undefined instruction, but unlike UD2 this was not published  
ALTINST  0F 3F  Jump and execute instructions in the undocumented Alternate Instruction Set.  Only available on some x86 processors made by VIA Technologies. 
Undocumented x87 instructions
FFREEP performs FFREE ST(i) and pop stack
See also
 CLMUL
 RDRAND
 Larrabee extensions
 Advanced Vector Extensions 2
 Bit Manipulation Instruction Sets
 CPUID
References
 ↑ ^{1.0} ^{1.1} "Re: Intel Processor Identification and the CPUID Instruction". http://www.intel.com/content/www/us/en/processors/processoridentificationcpuidinstructionnote.html?wapkw=processoridentificationcpuidinstruction. Retrieved 20130421.
 ↑ Toth, Ervin (19980316). "BSWAP with 16bit registers". Archived from the original on 19991103. https://web.archive.org/web/19991103025640/http://www.df.lth.se/~john_e/gems/gem000c.html. "The instruction brings down the upper word of the doubleword register without affecting its upper 16 bits."
 ↑ Coldwin, Gynvael (20091229). "BSWAP + 66h prefix". https://gynvael.coldwind.pl/?id=268. Retrieved 20181003. "internal (zero)extending the value of a smaller (16bit) register … applying the bswap to a 32bit value "00 00 AH AL", … truncated to lower 16bits, which are "00 00". … Bochs … bswap reg16 acts just like the bswap reg32 … QEMU … ignores the 66h prefix"
 ↑ "RSM—Resume from System Management Mode". Archived from the original on 20120312. https://web.archive.org/web/20120312224625/http://www.softeng.rl.ac.uk/st/archive/SoftEng/SESP/html/SoftwareTools/vtune/users_guide/mergedProjects/analyzer_ec/mergedProjects/reference_olh/mergedProjects/instructions/instruct32_hh/vc279.htm.
 ↑ Intel 64 and IA32 Architectures Optimization Reference Manual, section 7.3.2
 ↑ Intel 64 and IA32 Architectures Software Developer’s Manual, section 4.3, subsection "PREFETCHh—Prefetch Data Into Caches"
 ↑ Hollingsworth, Brent. "New "Bulldozer" and "Piledriver" instructions" (pdf). Advanced Micro Devices, Inc.. http://amddev.wpengine.netdnacdn.com/wordpress/media/2012/10/NewBulldozerandPiledriverInstructions.pdf. Retrieved 11 December 2014.
 ↑ "Family 16h AMD ASeries Data Sheet" (PDF). amd.com. AMD. October 2013. http://support.amd.com/TechDocs/52169_KB_A_Series_Mobile.pdf. Retrieved 20140102.
 ↑ "AMD64 Architecture Programmer's Manual, Volume 3: GeneralPurpose and System Instructions" (PDF). amd.com. AMD. October 2013. http://support.amd.com/TechDocs/24594.pdf. Retrieved 20140102.
 ↑ "tbmintrin.h from GCC 4.8". https://gcc.gnu.org/viewcvs/gcc/branches/gcc4_8branch/gcc/config/i386/tbmintrin.h?revision=196696&view=markup. Retrieved 20140317.
 ↑ https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64ia32architecturesoptimizationmanual.pdf section 3.5.2.3
 ↑ "The microarchitecture of Intel, AMD and VIA CPUs: An optimization guide for assembly programmers and compiler makers". http://www.agner.org/optimize/microarchitecture.pdf. Retrieved October 17, 2016.
 ↑ "Chess programming AVX2". https://chessprogramming.wikispaces.com/AVX2. Retrieved October 17, 2016.
