Initial version: 2025-03-12
Last update: 2025-04-01
Comparison of 8087 and 8086 Clock Times
| Approximate execution time (in us)
+-------------+---------------------
Instruction | 8087 | 8086 Emulation
| 8 MHz clock |
----------------------------+-------------+---------------------
Add/Subtract | 10 | 1000
Multiply (single precision) | 11.9 | 1000
Multiply (double precision) | 16.0 | 1312
Divide (single precision) | 24.4 | 2000
Compare | 5.6 | 812
Load (double precision) | 6.3 | 1062
Store (double precision) | 13.1 | 750
Square root | 22.5 | 12250
Tangent | 56.3 | 8125
Exponentiation | 62.5 | 10687
MM0
through MM7
, and operations that operate on them. To avoid compatibility problems with the context switch mechanisms in existing operating systems, the MMX registers are aliases for the existing x87 floating-point unit (FPU) registers, which context switches would already save and restore. However, unlike the x87 registers, which behave like a stack, the MMX registers are each directly addressable (random access). Each 64-bit MMX register corresponds to the mantissa part of an 80-bit x87 register. The upper 16 bits of the x87 registers thus go unused in MMX, and these bits are all set to ones, making them Not a Number (NaN) data types, or infinities in the floating-point representation.
8 bit case:
11001000_(2) = 200_(10)
11001000_(2) = 200_(10)
------------------------ ADD
110010000_(2) = 400_(10) exact result
|
| wraparound by truncation
V
X10010000_(2) = 144_(10) = 400 mod 256
|
truncate
8 bit case:
11001000_(2) = 200_(10)
11001000_(2) = 200_(10)
------------------------ ADD
110010000_(2) = 400_(10) exact result
|
| maximum value on 8 bits
| is 255
V
11111111_(2) = 255_(10)
XMM0
through XMM7
. The AMD64 extensions from AMD (originally called x86-64) added a further eight registers XMM8
through XMM15
, and this extension is duplicated in the Intel 64 architecture. There is also a new 32-bit control/status register, MXCSR
. The registers XMM8
through XMM15
are accessible only in 64-bit operating mode.
FXSAVE
and FXRSTOR
instructions, which is the extended pair of instructions that can save all x86 and SSE register states at once. This support was quickly added to all major IA-32 operating systems.
XMM0
–XMM7
to YMM0
–YMM7
(in x86-64 mode, from XMM0
–XMM15
to YMM0
–YMM15
). Each YMM register can hold and do simultaneous operations on:
a := a + b
can now use a non-destructive three-operand form c := a + b
, preserving both source operands. Such form of instruction is very typical for RISC architectures.
OPERAND OPERAND
| OPERATOR |
| | |
123 + 321
You may asked: "Are there any alternatives?" Yes, there are. You may place operator so it precede its operands or operator may follow its operands:
OPERATOR OPERAND OPERAND
| | |
+ 123 123
OPERAND OPERAND OPERATOR
| | |
123 123 +
In the first case, when operator precedes its operands, you have prefix notation. In the latter, when operator follows its operands, you have postfix notation.
Infix expression: (6-2)*5
RPN expression : 6 2 - 5 *
1. token := next_token(expression) // token = 6
2. Because token is a number push is into the stack // STACK: 6
3. token := next_token(expression) // token = 2
4. Because token is a number push is into the stack // STACK: 6 2
5. token := next_token(expression) // token = -
6. Because token is a two operands operator take two numbers
from the stack and subtract first from the second:
op1 := pop() // STACK 6
op2 := pop() // STACK 2
tmp := op2 - op1 // tmp := 6 - 2
push(tmp) // STACK 4
7. token := next_token(expression) // token = 5
8. Because token is a number push it into the stack // STACK: 4 5
9. token := next_token(expression) // token = *
10. Because token is a two operands operator take two numbers
from the stack and multiply second by the first:
op1 := pop() // STACK 5
op2 := pop() // STACK 4
tmp := op2 * op1 // tmp := 4 * 5
push(tmp) // STACK 20
11. token := next_token(expression) // token = NULL
End of calculation.
12. Result is on the top of the stack
result := pop()
As you can see this is very elegant and efficient calculation method. Any standard infix arithmetic expression can be easily converted to an RPN expression. A well-known algorithm for converting from infix to postfix notation is Dijkstra’s Shunting Yard Algorithm. This algorithm uses a queue and a stack to do the conversion and could provide you with some good programming practice for the study of data structures.
5 + 2 * 4
(5 + (2 * 4))
.
(5 (2 4 x ) +)
.
7 8 6 * +
infix: (3 + 5) / (9 - 5)
RPN: 3 5 + 9 2 - *
infix: 28 / (6 + 2 * 4)
RPN: 28 6 2 4 * + /
infix: (5 + 7) / ((8 – 6) * 3)
RPN: 5 7 + 8 6 - 3 * /
R0
to R7
; Note that R0
-R7
are internal names and can not be used by programmer and instead ST(0)
-ST(7)
are used what would be clarified further) and the following special-purpose registers:
IE
, Invalid Operation, bit 0DE
, Denormalized Operand, bit 1ZE
, Zero Divide, bit 2OE
, Overflow, bit 3UE
, Underflow, bit 4PE
, Precision, bit 5SF
, Stack Fault Flag, bit 6SF
flag when it detects a stack overflow or underflow condition, but it does not explicitly clear the flag when it detects an invalid-arithmetic-operand condition. When this flag is set, the condition code flag C1
indicates the nature of the fault: overflow (C1 = 1)
and underflow (C1 = 0
). The SF
flag is a ''sticky'' flag, meaning that after it is set, the processor does not clear it until it is explicitly instructed to do so (for example, by an FINIT
/FNINIT
instruction).
ES
, Error Summary Status (bit 7). The x87 FPU detects the six classes of exception conditions:
#I
), with two subclasses:
#IS
)
#IA
)
#D
)#Z
)#O
)#U
)#P
)
C0
-C3
, Condition Code, bit 8, 9, 10 and 14. The four condition code flags indicate the results of floating-point comparison and arithmetic operations. These condition code bits are used principally for conditional branching and for storage of information used in exception handling.TOP
, Top of Stack Pointer, bits 11 through 13. TOP is a pointer to the FPU data register that is currently at the top of the FPU register stack. This pointer is a binary value from 0 to 7.B
, FPU busy, bit 15.
Table: Rounding Modes and Encoding of Rounding Control (RC) Field
Rounding Mode | RC Field Setting | Description
| (binary) |
-----------------+------------------+-------------
Round to nearest | 00 | Rounded result is the closest to the infinitely
(even) | | precise result. If two values are equally close,
| | the result is the even value (that is, the one with
| | the least-significant bit of zero). Default mode.
| |
Round down | 01 | Rounded result is closest to but no greater than
| | the infinitely precise result.
| |
Round up | 10 | Rounded result is closest to but no less than the
| | infinitely precise result.
| |
Round toward zero| 11 | Rounded result is closest to but no greater in absolute
(Truncate) | | value than the infinitely precise result.
77 66 0
98 43 0
SEEEECCCCCCCCC
| | |
| | significand or coefficient (64 bits)
| |
| exponent (15 bits)
|
sign (1 bit)
When floating-point, integer, or packed BCD integer values are loaded from memory into any of the FPU data registers, the values are automatically converted into double extended-precision floating-point format (if they are not already in that format). When computation results are subsequently transferred back into memory from any of the x87 FPU registers, the results can be left in the double extended-precision floating-point format or converted back into a shorter floating-point format, an integer format, or the packed BCD integer format.
R0
-R7
are treated as a register stack where R7
is a base and stack growths towards R0
. All addressing of the data registers is relative to the register on the top of the stack. The register number of the current top-of-stack register is stored in the TOP
field in the FPU status word. The current TOP
register is always named as ST(0)
or simply ST
, and ST(i)
is used to specify the $i$-th register from TOP
in the stack where $i=\{0,\dots,7\}$:
FPU Data Register Stack
7 xxx ST(3)
6 xxx ST(2)
5 xxx ST(1)
4 xxx ST(0) <--- TOP = 100_(2)
3 xxx
2 xxx
1 xxx
0 xxx
Growth stack: stack growth from higher register (R7) to lower (R0).
Registry-like FPU memory organization reflects the approach to perform arithmetic operations, in particular the evaluation of a mathematical expression. Unlike the "ordinary part" of the code, which is processed by the "universal execution unit", numerical calculations are most conveniently carried out according to a RPM schema explained earlier. And it is to this scheme that the shape of the floating-point unit is adapted.
TOP
by one and load a value into the new top-of-stack register, and store operations store the value from the current TOP
register in memory and then increment TOP by one (note that there are also load and store operations that do not move top of the stack). You can think about load operation as equivalent to a push and a store operation as equivalent to a pop.
TOP
is at 0, register wraparound occurs and the new value of TOP
is set to 7. The floating-point stack-overflow exception indicates when wraparound might cause an unsaved value to be overwritten. Many floating-point instructions have several addressing modes that permit the programmer to implicitly operate on the top of the stack, or to explicitly operate on specific registers relative to the TOP
.
ST(0)
and ST(1)
are used. In this case both arguments are replaced by the result of instruction:
FADD --> FADDP ST(1), ST(0) --> ST(1) + ST(0) -> ST(1) and free ST(0)
ST(0)
and ST(i)
:
FADD ST(0), ST(i) --> ST(0) + ST(i) -> ST(0)
FADD ST(i), ST(0) --> ST(i) + ST(0) -> ST(i)
ST(i)
. When instruction is completed, source argument is poped from a stack:
FADDP ST(i), ST(0) --> ST(i) + ST(0) -> ST(i) and free ST(0)
ST(0)
:
FADD memory --> ST(0) + memory -> ST(0)
TOP
contains 100
which means that register R4 is the top of the stack). You can do this with code:
FLD [vec1]
FMUL [vec2]
FLD [vec1 + 8]
FMUL [vec2 + 8]
FADD ST(1)
FLD [vec1]
This instruction decrements the stack register pointer (TOP
) and loads the value 1.2 from memory into ST(0)
(physical register R3
).FMUL [vec2]
The second instruction multiplies the value in ST(0)
by the value 5.6 from memory and stores the result in ST(0)
. At this moment R3
register contains value 6.72.FLD [vec1 + 8]
The third instruction decrements TOP
and loads the value 3.4 in ST(0)
. At this moment registers R3
contains value 6.72 while R2
contains 3.4.FMUL [vec2 + 8]
The fourth instruction multiplies the value in ST(0)
by the value 7.8 from memory and stores the result in ST(0)
. At this moment registers R3
contains value 6.72 and R2
contains value 26.52.FADD ST(1)
The fifth instruction adds the value from ST(0)
and the value from ST(1)
and stores the result in ST(0)
. At this moment only registers R3
contains value 6.72 and R2
contains a final result.printf
function than in case of 64-bit convention. As an exercise for you I leave to write corresponding 64-bit code.
section .data
fmt: db 10,"exception: %d",10,"top: %d",10,"R7 %d",10,"R6 %d",10,"R5 %d"
db 10,"R4 %d",10,"R3 %d",10,"R2 %d",10,"R1 %d",10,"R0 %d",10,0
section .bss
env: resd 7 ; You need 28 bytes for saving the current
; FPU operating environment
section .text
extern printf
global main
main:
finit ; Initialize FPU
fld1 ; Push +1.0 onto the FPU register stack.
fld1
fld1
fld1
call aux_print ; Call auxiliary print code
faddp st3, st0 ; Add ST(0) to ST(i) (in this case i=3),
; store result in ST(i), and pop the
; register stack.
call aux_print
; Exit
mov eax, 0 ; Exit code, 0=normal
ret ; Main returns to operating system
; Auxiliary print code
aux_print:
fstenv [env] ; Saves the current FPU operating environment
; at the memory location specified with
; the destination operand
xor eax, eax
mov ax, [env+8] ; Copy to AX contents of the FPU tag word
mov ecx, 0 ; Set counter as 0
loop: ; do-while loop begin
mov ebx, eax ; At the beginning EAX = Tag Word
and ebx, 3 ; Extract bits 0 and 1
shr eax, 2 ; Shift right to extract next two bits
; in next iteration
push ebx ; Save extracted two bits on the stack
inc ecx ; Increase value of the counter
cmp ecx, 8 ; While condition test
jne loop ; do-while loop end
xor eax, eax ; Clear eax register
fstsw ax ; Save status word
mov ebx, eax
shr bx, 11 ; Shift ax right by 11 to get top-of-stack
; (TOP) pointer value
and bx, 7 ; A bit-wise AND of the two operands:
; BX and binary pattern 111
push ebx ; Save TOP on the stack
mov ebx, eax ; Prepare to extract some exceptions flags
;xxxxxxxxx1xxxxx1 bit 7 - Stack Fault (64 decimal)
; bit 1 - Invalid Operation ( 1 decimal)
and bx, 0000000001000001b ; A bit-wise AND of the two operands:
; BX and binary pattern 1
push ebx ; Save some exceptions flags's bits on the stack
push fmt ; Address of format string
call printf ; Call C function
add esp, 44 ; Pop stack 11*4 bytes
ret
; End of the code
Expected output is:
fulmanp@fulmanp-ThinkPad-T540p:~$ nasm -f elf fpu_test_01_32.asm -o fpu_test_01_32.o
fulmanp@fulmanp-ThinkPad-T540p:~$ gcc -m32 fpu_test_01_32.o -o fpu_test_01_32 -no-pie
/usr/bin/ld: warning: fpu_test_01_32.o: missing .note.GNU-stack section implies executable stack
/usr/bin/ld: NOTE: This behaviour is deprecated and will be removed in a future version of the linker
fulmanp@fulmanp-ThinkPad-T540p:~$ ./fpu_test_01_32
exception: 0
top: 4
R7 0
R6 0
R5 0
R4 0
R3 3
R2 3
R1 3
R0 3
exception: 0
top: 5
R7 0
R6 0
R5 0
R4 3
R3 3
R2 3
R1 3
R0 3
finit
FINIT
sets the FPU control, status, tag, instruction pointer, and data pointer registers to their default states. The FPU control word is set to 037FH
(round to nearest, all exceptions masked, 64-bit precision). The status word is cleared (no exception flags set, TOP
is set to 0
). The data registers in the register stack are left unchanged, but they are all tagged as empty (11B
). Both the instruction and data pointers are cleared.
fld1
FLDX
where X
is one of the following values: 1
, L2T
, L2E
, PI
, LG2
, LN2
, Z
push one of seven commonly used constants (in double extended-precision floating-point format) onto the FPU register stack. The constants that can be loaded with these instructions include +1.0 (1), +0.0 (Z), $log_{2}10$ (L2T), $log_{2}e$ (L2E), $\pi$ (PI), $log_{10}2$ (LG2), and $log_{e}2$ (LN2).
FADDP ST(i), ST(0)
(for i=3
) add ST(0)
to ST(i)
, store result in ST(i)
, and pop the register stack.
FADD
/FADDP
/FIADD
).
fstenv
FSTENV
saves the current FPU operating environment at the memory location specified with the destination operand, and then masks all floating-point exceptions. The FPU operating environment consists of the FPU: control word, status word, tag word, instruction pointer, data pointer, and last opcode.
3 1 1
1 6 5 0
|xxxxxxxxxxx|Control Word| B3 - B0
|xxxxxxxxxxx|Status Word | B7 - B4
|xxxxxxxxxxx|Tag Word | B11 - B8
| B15 - B12
| B19 - B16
| B23 - B20
| B27 - B24
x - not used
Contents of bytes B12-B27 depends on mode
and is not relevant to this example.
mov ax, [env+8]
AX
contents of the FPU tag word. Next you extract every 2-bits pair and associate it with floating-point register.
section .data
fmt: db "result is %d", 10, 0
a: dq 2.5
b: dq 3.0
section .bss
tmp: resq 1
buf: resw 1
section .text
extern printf
global main
main:
finit ; Initialize FPU
fstcw [buf] ; Save control word
;xxxx11xxxxxxxxxx
or word [buf], 0000110000000000b ; Bits 11-10 controls rounding:
; 00 round to nearest (default),
; 01 round down, [0,1) -> 0 [1,2) -> 1 [-1,0) -> -1 [-2,-1) -> -2
; 10 round up, (0,1] -> 1 (1,2] -> 2 (-1,0] -> 0 (-2,-1] -> -1
; 11 round toward zero [0,1) -> 0 [1,2) -> 1 (-1,0] -> 0 (-2,-1] -> -1
; for positives behaves like round down
; for negatives behaves like round up
fldcw [buf] ; Load updated control word
fld qword [a] ; Load a to FPU
fmul qword [b] ; Multiply by b
fist dword [tmp] ; Cast result to int
push dword [tmp]
push fmt
call printf
add esp, 8
; Exit
mov eax, 0 ; Exit code, 0=normal
ret ; Main returns to operating system
; End of the code
fulmanp@fulmanp-ThinkPad-T540p:~$ nasm -f elf fpu_test_02_32.asm -o fpu_test_02_32.o
fulmanp@fulmanp-ThinkPad-T540p:~$ gcc -m32 fpu_test_02_32.o -o fpu_test_02_32 -no-pie
/usr/bin/ld: warning: fpu_test_02_32.o: missing .note.GNU-stack section implies executable stack
/usr/bin/ld: NOTE: This behaviour is deprecated and will be removed in a future version of the linker
fulmanp@fulmanp-ThinkPad-T540p:~$ ./fpu_test_02_32
result is 7
section .data
fmt: db "status word value %d", 10, 0
a: dq 2.5
b: dq 0.0
section .bss
tmp: resq 1
buf: resw 1
section .text
extern printf
global main
main:
finit ; Initialize FPU
fld qword [a] ; Load a to FPU
fdiv qword [b] ; Divide by b
xor eax, eax
fstsw ax ; Stores the current value of the FPU status word
; in the destination location. The destination
; operand can be either a two-byte memory location
; or the AX register.
push eax
push fmt
call printf
add esp, 8
; Exit
mov eax, 0 ; Exit code, 0=normal
ret ; Main returns to operating system
; End of the code
fulmanp@fulmanp-ThinkPad-T540p:~$ nasm -f elf fpu_test_03_32.asm -o fpu_test_03_32.o
fulmanp@fulmanp-ThinkPad-T540p:~$ gcc -m32 fpu_test_03_32.o -o fpu_test_03_32 -no-pie
/usr/bin/ld: warning: fpu_test_03_32.o: missing .note.GNU-stack section implies executable stack
/usr/bin/ld: NOTE: This behaviour is deprecated and will be removed in a future version of the linker
fulmanp@fulmanp-ThinkPad-T540p:~$ ./fpu_test_03_32
status word value 14340
Decimal value 14340 is equal to binary 11100000000100 which means that the ZE (Zero Divide) flag was set. Also we can see that TOP has decimal value 7 (111 binary).