Initial version: 2025-02-25
Last update: 2025-03-02
CALL
and RET
instructions.
General registers
111111
5432109876543210
|______AX______| Accumulator
|__AH__||__AL__|
|______BX______| Base
|__BH__||__BL__|
|______CX______| Count
|__CH__||__CL__|
|______DX______| Data
|__DH__||__DL__|
Pointers and index
|______SP______| Stack Pointer
|______BP______| Base Pointer
|______SI______| Source Index
|______DI______| Destination Index
Segment
|______CS______| Code
|______DS______| Data
|______SS______| Stack
|______ES______| Extract
Program status
|____FLAGS_____|
|______PC______|
When the architecture was extended to 32 bits, Intel decided to preserve backward compatibility, so the general registers layout stayed the same, but with "extended" higher part above 16 bits (probably this explains letter E
prefixes "old" names):
80386, Pentium
General registers
3322222222221111111111
10987654321098765432109876543210
|______________E?X_____________|
|______?X______|
|__?H__||__?L__|
? = A, B, C or D
Pointers and index
|______________ESP_____________|
|______________EBP_____________|
|______________ESI_____________|
|______________EDI_____________|
Program status
|______________EAX_____________|
|______________EAX_____________|
Similar approach was applied in case of transition to 64-bit architecture.
Intel, AMD 64-bit
666655555555554444444444333333333322222222221111111111
3210987654321098765432109876543210987654321098765432109876543210
| || || || || || || || |
|__64__||__56__||__48__||__40__||__32__||__24__||__16__||__8___|
|_____________________________RAX______________________________|
|xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx|______________EAX_____________|
|xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx|______AX______|
|xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx|__AH__||__AL__|
RAX, RBX, RCX, RDX, RSI, RDI, RSP, RBP, R8-R15
General purpose registers: EAX, EBX, ECX, EDX, ESI, EDI
EBP -- Base Pointer
ESP -- Stack pointer
x64 extends x64's 8 general-purpose registers to be 64-bit, and adds 8 new 64-bit registers. The 64-bit registers have names beginning with R
, so for example the 64-bit extension of EAX
is called RAX
. The new registers are named R8
through R15
. Why R
? Some people say R
is from really extended which is nice explanation, but probably R
comes simply from register.
ESI
, whose lower 8 bits were not previously addressable. The following table specifies the assembly-language names for the lower portions of 64-bit registers:
64-bit register | Lower 32 bits | Lower 16 bits | Lower 8 bits
==============================================================
rax | eax | ax | al
rbx | ebx | bx | bl
rcx | ecx | cx | cl
rdx | edx | dx | dl
rsi | esi | si | sil
rdi | edi | di | dil
rbp | ebp | bp | bpl
rsp | esp | sp | spl
r8 | r8d | r8w | r8b
r9 | r9d | r9w | r9b
r10 | r10d | r10w | r10b
r11 | r11d | r11w | r11b
r12 | r12d | r12w | r12b
r13 | r13d | r13w | r13b
r14 | r14d | r14w | r14b
r15 | r15d | r15w | r15b
Sometimes rXb is called rXl, l - lower.
def int_div(dividend, divisor):
integer_part = (int)(dividend / divisor)
reminder = (int)(dividend % divisor)
return (integer_part, reminder)
def print_number(number):
n = number
while n > 0:
integer_part, reminder = int_div(n, 10)
n = integer_part
digit = reminder
print(digit, end='')
number = 12345
print_number(number)
The int_div(dividend, divisor)
is a helper function performing integer division of two integers. For example int_div(27, 10)
returns tuple (2, 7)
.
54321
As you can see this algorithm has some drawback – it prints digits "in reverse order" where the least significant digit is printed as the most left digit. You can fix it easily introducing buffer:
def print_number(number):
n = number
buffer = ""
while n > 0:
integer_part, reminder = int_div(n, 10)
n = integer_part
digit = reminder
buffer = (str)(digit) + buffer
print(buffer)
Now the result is correct:
12345
DIV
(unsigned integer divide) divides unsigned integer value in the AX
, DX:AX
, EDX:EAX
, or RDX:RAX
registers (dividend) by the source operand (divisor) and stores the result in the AX
(AH:AL
), DX:AX
, EDX:EAX
, or RDX:RAX
registers. The source operand can be a general purpose register or a memory location. The action of this instruction depends on the operand size (dividend/divisor). Division using 64-bit operand is available only in 64-bit mode. This instruction has the following formats:
Operand Size Dividend Divisor Quotient Remainder Maximum Quotient
Word/byte AX r/m8 AL AH 255
Doubleword/word DX:AX r/m16 AX DX 65,535
Quadword/
doubleword EDX:EAX r/m32 EAX EDX 2^32 - 1
Doublequadword/
quadword RDX:RAX r/m64 RAX RDX 2^64 - 1
For example consider the following division:
AX = 10111011 01111110 = 47998_10
BL = 11000010 = 192_10
AX / BL -> AL = 249_10 (11111001), AH = 190_10 (10111110)
AX = 10111110 11111001
However the life is not always so easy. There can be situation when quotient (integer part of the result) will not fit into designated register:
AX = 11111111 11111111 = 65535_10
BL = 00000010 = 2_10
AX / BL -> 32767_10 + 1_10
32767_10 = 01111111 11111111 > 11111111 = FFh
In case when result does not fit into register the #DE
(Division Error) exception is raised.
DIV
alone it wouldn't be possible to implement anything, so you need some other instructions. These instructions are: CMP
, INC
, JMP
, JNE
, MOV
, SUB
and XOR
. let's take a look at them one by one.
CMP
instructionCMP first, second
compares the first
source operand with the second
source operand and sets the status flags in the EFLAGS
register according to the results. The comparison is performed by subtracting the second operand from the first operand (tmp = first - second
) and then setting the status flags in the same manner as the SUB
instruction: it changes the values of ZF
and CF
flags; see examples below to get know how it works.
AX < BX
MOV AX,5
MOV BX,8
CMP AX,BX
Result: ZF = 0
and CF = 1
AX > BX
MOV AX,8
MOV BX,5
CMP AX,BX
Result: ZF = 0
and CF = 0
AX = BX
MOV AX,5
MOV BX,AX
CMP AX,BX
Result: ZF = 1
and CF = 0
CMP
instruction is typically used in conjunction with a conditional jump from Jcc
family, condition move (CMOVcc
family), or SETcc
instruction.
INC
instructionINC what
adds 1 to the destination operand what
, while preserving the state of the CF
flag. The destination operand can be a register or a memory location. This instruction allows a loop counter to be updated without disturbing the CF
flag.
LOCK
prefix to allow the following instruction to be executed atomically:
lock
inc ebx
JMP
instructionJMP where
– transfers program control to a different point in the instruction stream without recording return information. The destination (target) operand specifies the address of the instruction being jumped to. This operand can be an immediate value, a general-purpose register, or a memory location.
JNE
instructionJNE where
– Checks the state of one or more of the status flags in the EFLAGS register (CF
, OF
, PF
, SF
, and ZF
) and, if the flags are in the specified state (condition), performs a jump to the target instruction specified by the destination operand. A condition code (cc) is associated with each instruction to indicate the condition being tested for. If the condition is not satisfied, the jump is not performed and execution continues with the instruction following the Jcc
instruction.
JRCXZ
, JECXZ
, and JCXZ
instructions differ from other Jcc
instructions because they do not check status flags. Instead, they check RCX
, ECX
or CX
for 0.
MOV
instructionMOV dst, src
copies the second operand (src
– source operand) to the first operand (dst
– destination operand). The source operand can be an immediate value, general-purpose register, segment register, or memory location; the destination register can be a general-purpose register, segment register, or memory location. Both operands must be the same size, which can be a byte
, a word
, a doubleword
, or a quadword
.
SUB
instructionSUB dst, src
subtracts the second operand (src
– source operand) from the first operand (dst
– destination operand) and stores the result in the destination operand:
dst := dst - src
The destination operand can be a register or a memory location; the source operand can be an immediate, register, or memory location (however, two memory operands cannot be used in one instruction). When an immediate value is used as an operand, it is sign-extended to the length of the destination operand format.
XOR
instructionXOR dst, src
performs a bitwise exclusive OR (XOR) operation on the destination (first) and source (second) operands and stores the result in the destination operand location:
dst := dst XOR src
The source operand can be an immediate, a register, or a memory location; the destination operand can be a register or a memory location (however, two memory operands cannot be used in one instruction). Each bit of the result is 1 if the corresponding bits of the operands are different; each bit is 0 if the corresponding bits are the same.
section .data ; Data section
transTab: db "0123456789" ; Translation table
section .bss ; Block Starting Symbol section
; It contains uninitialized data.
result: resb 16 ; Reserve space for result.
; Max 16 digit
section .text
global _start
_start:
; Put data to print into
; EDX:EAX
; For simplicity I assume EDX is always equal to 0
mov edx, 0 ; Set EDX to default value
mov eax, 12345 ; Set EAX to the number you want to print
jmp printNumber ; Let's print
; Print number code: begin
; Init
printNumber:
xor rbx, rbx ; Clear RBX register (set it to 0)
mov ebx, result ; Set EBX part of RBX to point to the beginning of the buffer
; BEGIN: Prepare data
printLoop:
mov ecx, 10
div ecx ; Div EDX:EAX by ECX
; EAX = quotient (an integer part)
; EDX = remainder
mov ecx, [transTab + edx] ; Copy ASCII value corresponding to reminder to ECX
mov [ebx], cl ; Copy CL part of ECX (1 byte instead of 4 bytes) to 'result' buffer
inc ebx ; Move to the next byte in the buffer
mov edx, 0 ; Restore default EDX value
cmp eax, 0 ; Compare EAX with immediate value: 0
jne printLoop ; Jump if operands of previous CMP instruction
; are not equal - keep looping until EAX
; is zero which means that all digits are
; converted. When done go to the print part
; BEGIN: Print result buffer
print:
sub ebx, result ; Calculate length of a string to print
mov rdx, rbx ; arg3: length of a string to print
mov rsi, result ; arg2: pointer to a string
mov rdi, 1 ; arg1: where to write, so called `file descriptor`
; in this case stdout (screen)
mov rax, 1 ; System call number (sys_write)
syscall ; Call a system function
; BEGIN: Exit
mov rdi, 0 ; Exit code, 0=normal
mov rax, 60 ; System call number (sys_exit)
syscall ; Call a system function
; End of the code
I hope the code with my comments is clear but I want to comment one instruction:
mov [ebx], cl ; Copy CL part of ECX (1 byte instead of 4 bytes) to 'result' buffer
If you look into two preceding instructions:
div ecx ; Div EDX:EAX by ECX
; EAX = quotient (an integer part)
; EDX = remainder
mov ecx, [transTab + edx] ; Copy ASCII value corresponding to reminder to ECX
you see that EDX
contains reminder of division. Because you divide by 10, the reminder is in range from 0 to 9. Notice that when the lowest digit is 5, the reminder is 5, when the lowest digit is 7, the reminder is 7, etc. So the reminder is a position in transTab
of a corresponding digit (this is exactly what is calculated in [transTab + edx]
where EDX
is a position in table transTab
or more properly EDX
is an offset from the address transTab
which is the beginning of the translation table.
ECX
is a 32-bit register, so you transfer from memory 32 bits starting at address transTab + edx
:
div(13,10) -> reminder=3
beginning of transTab
|
| transTab+reminder
| |
0123456789
||||
|||4th byte to transfer
|||
||3rd byte to transfer
||
|2nd byte to transfer
|
1st byte to transfer
You need only the first byte but all four bytes
will be transferred.
In consequence executing mov ecx, [transTab + edx]
you transfer 4 bytes into ECX
.
mov [ebx], ecx
you will copy 4 bytes to the buffer starting at address given in EBX
. This will get you into trouble when you will be close to the end of the buffer. For example, if you will be at the last possible address, then above instruction will put 1st byte from translation address at last address but then 2nd byte at last+1 address, 3rd byte at last+2 address and finally 4th byte at last+3 address. As you san see this way you will exceed your buffer by 3 bytes and possibly destroy other data.
section .data ; Data section
transTab: db "0123456789" ; Translation table
section .bss ; Block Starting Symbol section
; It contains uninitialized data.
result: resb 16 ; Reserve space for result.
; Max 16 digit
section .text
global _start
_start:
; Put data to print into
; EDX:EAX
; For simplicity I assume EDX is always equal to 0
mov edx, 0 ; Set EDX to default value
mov eax, 12345 ; Set EAX to the number you want to print
jmp printNumber ; Let's print
; BEGIN: Print number code
; Init
printNumber:
xor rbx, rbx ; Clear RBX register (set it to 0)
; mov ebx, result ; Set EBX to point to the beginning of the buffer
mov ebx, result+15 ; Set EBX part of RBX to point to the end of the buffer UPDATED
; BEGIN: Prepare data
printLoop:
mov ecx, 10
div ecx ; Div EDX:EAX by ECX
; EAX = quotient (an integer part)
; EDX = remainder
mov ecx, [transTab + edx] ; Copy ASCII value corresponding to reminder to ECX
mov [ebx], cl ; Copy CL part of ECX (1 byte instead of 4 bytes) to 'result' buffer
; inc ebx ; Move to the next byte in the buffer
dec ebx ; Move to the previous byte in the buffer UPDATED
mov edx, 0 ; Restore default EDX value
cmp eax, 0 ; Compare EAX with immediate value: 0
jne printLoop ; Jump if operands of previous CMP instruction
; are not equal - keep looping until EAX
; is zero which means that all digits are
; converted. When done go to the print part
; BEGIN: Print result buffer
print:
; sub ebx, result ; Calculate length of a string to print
; Calculate length of a string to print UPDATED
xor rax, rax ; Set RAX to be equal to 0 NEW
mov eax, result+16 ; Prepare `DST` argument for SUB (DST := DST – SRC) NEW
sub eax, ebx ; Get the length UPDATED
mov rdx, rax ; arg3: length of a string to print
; mov rsi, result ; arg2: pointer to a string
xor rsi, rsi ; Clear RSI register (set it to 0) NEW; see explanation below
mov esi, ebx ; arg2: pointer to a string UPDATED
mov rdi, 1 ; arg1: where to write, so called `file descriptor`
; in this case stdout (screen)
mov rax, 1 ; System call number (sys_write)
syscall ; Call a system function
; END: Print number code
; BEGIN: Exit
mov rdi, 0 ; Exit code, 0=normal
mov rax, 60 ; System call number (sys_exit)
syscall ; Call a system function
; End of the code
The code is not much different than previous version, however one part needs my explanations.
mov rsi, result ; arg2: pointer to a string
Note that RSI
is a 64-bit register while result
is a numeric constant. This constant is encoded as immediate value on 64-bit to fit smoothly into 64-bit register.
EBX
register. Because EBX
is a 32-bit register it fits into ESI
which is a lower part of 64-bit RSI
register. If you do simply:
mov esi, ebx
the lower part of RSI
would be equal to EBX
. What about higher part of RSI
? Nobody knows. It should be equal to 0 in order the whole RSI
to be equal to a pointer to a string (which is EBX
).
RSI
(set to 0) with XOR
instruction and then you can put safely into a lower part of RSI
content of EBX
so for sure RSI
would be equal to EBX
; hence the sequence of instructions:
xor rsi, rsi ; Clear RSI register (set it to 0) NEW; see explanation below
mov esi, ebx ; arg2: pointer to a string UPDATED
section .data ; Data section
global _start
_start:
mov dx, 0 ; dividend - higher half
mov ax, 16 ; dividend - lower half
mov cx, 5 ; divisor
div cx ; div dx:ax by cx
; Exit
; Use exit code to get result
mov rdi, rax ; Quotient
; or
;mov rdi, rdx ; Remainder
mov rax, 60 ; System call number (sys_exit)
syscall ; Call a system function
; End of the code
As you can see you divide 16
by 5
and just right after this you finish execution calling sys_exit
system function. Making this call in previous examples you wrote:
mov rdi, 0 ; Exit code, 0=normal
There is nothing against returning anything different than 0
– it does not influence execution of your program, your operating system or anything else. It is just an information for a caller who can interpret the number to decide if execution was successful or not. Typically 0 means normal end of the execution, but it is only a convention.
sys_exit
to return either quotient or remainder:
[... uncomment quotient, comment remainder ...]
fulmanp@fulmanp-k2:~/assembler$ nasm -f elf64 inst_64_div.asm
fulmanp@fulmanp-k2:~/assembler$ ld inst_64_div.o -o inst_64_div
fulmanp@fulmanp-k2:~/assembler$ ./inst_64_div
fulmanp@fulmanp-k2:~/assembler$ echo $?
3
[... comment quotient, uncomment remainder ...]
fulmanp@fulmanp-k2:~/assembler$ nasm -f elf64 inst_64_div.asm
fulmanp@fulmanp-k2:~/assembler$ ld inst_64_div.o -o inst_64_div
fulmanp@fulmanp-k2:~/assembler$ ./inst_64_div
fulmanp@fulmanp-k2:~/assembler$ echo $?
1
The sequence $?
is a special variable in BASH that always holds the return/exit code of the last executed command. You can view it in a terminal by running echo $?
.
sys_exit
ends execution of your program.
ntp
is the label of memory location designated to keep number to be printed (ntp
– number to print). You need also another one "fixed memory location designated to preserve address where to return from printing routine and continue execution – this would be a wtr
label (where to return).
section .data ; Data section
transTab: db "0123456789" ; Translation table
newLine: db 10 ; Code for printing a new line
ntp: dd 0 ; Number to print
; Here you have to move every number
; you want to print
wtr: dq 0 ; Where to return from print call
number1: dd 12345
number2: dd 67890
section .bss ; Block Starting Symbol section
; It contains uninitialized data.
result: resb 16 ; Reserve space for result.
; Max 16 digit
section .text
global _start
; BEGIN: Print number code
; Init
printNumber:
xor rbx, rbx ; Clear RBX register (set it to 0)
mov ebx, result+15 ; Set EBX part of RBX to point to the end of the buffer
; BEGIN: Prepare data
printLoop:
mov ecx, 10
div ecx ; Div EDX:EAX by ECX
; EAX = quotient (an integer part)
; EDX = remainder
mov ecx, [transTab + edx] ; Copy ASCII value corresponding to reminder to ECX
mov [ebx], cl ; Copy CL part of ECX (1 byte instead of 4 bytes) to 'result' buffer
dec ebx ; Move to the previous byte in the buffer
mov edx, 0 ; Restore default EDX value
cmp eax, 0 ; Compare EAX with immediate value: 0
jne printLoop ; Jump if operands of previous CMP instruction
; are not equal - keep looping until EAX
; is zero which means that all digits are
; converted. When done go to the print part
; BEGIN: Print result buffer
print:
; Calculate length of a string to print
xor rax, rax ; Set RAX to be equal to 0 NEW
mov eax, result+16 ; Prepare `DST` argument for SUB (DST := DST – SRC) NEW
sub eax, ebx ; Get the length
mov rdx, rax ; arg3: length of a string to print
xor rsi, rsi ; Clear RSI register (set it to 0) NEW; see explanation below
mov esi, ebx ; arg2: pointer to a string
mov rdi, 1 ; arg1: where to write, so called `file descriptor`
; in this case stdout (screen)
mov rax, 1 ; System call number (sys_write)
syscall ; Call a system function
; BEGIN: Print new line
mov rdx, 1 ; arg3: length of a string to print
mov rsi, newLine ; arg2: pointer to a string
mov rdi, 1 ; arg1: where to write, so called `file descriptor`
; in this case stdout (screen)
mov rax, 1 ; System call number (sys_write)
syscall ; Call a system function
; END: Print new line
jmp [wtr] ; Jump to the address saved in wtr (where to return)
; before print call
; END: Print number code
_start:
; Put data `number1` to print into
; EDX:EAX
; For simplicity I assume EDX is always equal to 0
mov edx, 0 ; Set EDX to default value
mov eax, [number1] ; Set EAX to the number you want to print
mov qword [wtr], cont1 ; Set return address from print routine
jmp printNumber ; "Call" print routine
cont1:
; Put data `number2` to print into
; EDX:EAX
; For simplicity I assume EDX is always equal to 0
mov edx, 0 ; Set EDX to default value
mov eax, [number2] ; Set EAX to the number you want to print
mov qword [wtr], cont2 ; Set return address from print routine
jmp printNumber ; "Call" print routine
cont2:
; BEGIN: Exit
mov rdi, 0 ; Exit code, 0=normal
mov rax, 60 ; System call number (sys_exit)
syscall ; Call a system function
; End of the code
RAX
, RBX
, RCX
, RDX
, RSI
, RDI
so you need a space to store all of them. You also need a method to pass a return address to the routine. In the code you have now this is for what you use wtr
. However it would be nice if you could use some other, well known address space without using explicite names like wtr
. A good idea is to use a stack with its POP
and PUSH
instructions. The general idea of the stack was explained in XXX, so please look there for clarification if you do not remember. With the stack the above tree-steps algorithm will be as follow:
RAX
, RBX
, RCX
, RDX
, RSI
, RDI
exactly in that order.
BP - base pointer
SP - stack pointer
address BP xxxx
address- 8 BP-[1*8= 8] DATA SP+[7*8=56]
address-16 BP-[2*8=16] WTR SP+[6*8=48]
address-24 BP-[3*8=24] RAX SP+[5*8=40]
address-32 BP-[4*8=32] RBX SP+[4*8=32]
address-40 BP-[5*8=40] RCX SP+[3*8=24]
address-48 BP-[6*8=48] RDX SP+[2*8=16]
address-56 BP-[7*8=56] RSI SP+[1*8= 8]
address-65 BP-[8*8=64] RDI SP
In the stack given as above an access to the RCX
register, for example, is possible relatively to the top of the stack given by RSP
value: RSP+24
.
address |some data | <-- BP, SP Bottom of the stack (BP). It is also tip of the stack ("current" position of the SP).
address- 1| |
address- 2| |
When you push first data into the stack, say it would be DATA
, the stack will contain:
address |xxxxxxxxxxx| <-- BP bottom of the stack. Address of the first byte you can save on the stack
address- 1|DATA byte 8|
address- 2|DATA byte 7|
address- 3|DATA byte 6|
address- 4|DATA byte 5|
address- 5|DATA byte 4|
address- 6|DATA byte 3|
address- 7|DATA byte 2|
address- 8|DATA byte 1| <-- BP-8 offset by 8 bytes (64 bits) from the bottom of the stack.
This is also a "current" position of the SP
When you push next data into the stack, say it would be WTR
, it will contain:
address |xxxxxxxxxxx| <-- BP bottom of the stack. Address of the first byte you can save on the stack
address- 1|DATA byte 8|
address- 2|DATA byte 7|
address- 3|DATA byte 6|
address- 4|DATA byte 5|
address- 5|DATA byte 4|
address- 6|DATA byte 3|
address- 7|DATA byte 2|
address- 8|DATA byte 1| <-- BP-8 == SP+8
address- 9|WTR byte 8|
address-10|WTR byte 7|
address-11|WTR byte 6|
address-12|WTR byte 5|
address-13|WTR byte 4|
address-14|WTR byte 3|
address-15|WTR byte 2|
address-16|WTR byte 1| <-- BP-16 offset by 16 bytes (2 x 64 bits) from the bottom of the stack.
This is also a "current" position of the SP
From the above "zoom" you can see that to get byte 1
of the DATA
you can either use BP-8
address or SP+8
.
section .data ; Data section
transTab: db "0123456789" ; Translation table
newLine: db 10 ; Code for printing a new line
number1: dd 12345
number2: dd 67890
section .bss ; Block Starting Symbol section
; It contains uninitialized data.
result: resb 16 ; Reserve space for result.
; Max 16 digit
section .text
global _start
; BEGIN: Print number code
; Init
printNumber:
; Save registers as they are before routine execution
push rax
push rbx
push rcx
push rdx
push rsi
push rdi
; Put data to print taken from the stack
; into EDX:EAX
; For simplicity I assume EDX is always equal to 0
mov edx, 0 ; Set EDX to default value
mov eax, [rsp+56] ; Set EAX to the number you want to print;
; this number is on the stack
xor rbx, rbx ; Clear RBX register (set it to 0)
mov ebx, result+15 ; Set EBX part of RBX to point to the end of the buffer
; BEGIN: Prepare data
printLoop:
mov ecx, 10
div ecx ; Div EDX:EAX by ECX
; EAX = quotient (an integer part)
; EDX = remainder
mov ecx, [transTab + edx] ; Copy ASCII value corresponding to reminder to ECX
mov [ebx], cl ; Copy CL part of ECX (1 byte instead of 4 bytes) to 'result' buffer
dec ebx ; Move to the previous byte in the buffer
mov edx, 0 ; Restore default EDX value
cmp eax, 0 ; Compare EAX with immediate value: 0
jne printLoop ; Jump if operands of previous CMP instruction
; are not equal - keep looping until EAX
; is zero which means that all digits are
; converted. When done go to the print part
; BEGIN: Print result buffer
print:
; Calculate length of a string to print
xor rax, rax ; Set RAX to be equal to 0 NEW
mov eax, result+16 ; Prepare `DST` argument for SUB (DST := DST – SRC) NEW
sub eax, ebx ; Get the length
mov rdx, rax ; arg3: length of a string to print
xor rsi, rsi ; Clear RSI register (set it to 0) NEW; see explanation below
mov esi, ebx ; arg2: pointer to a string
mov rdi, 1 ; arg1: where to write, so called `file descriptor`
; in this case stdout (screen)
mov rax, 1 ; System call number (sys_write)
syscall ; Call a system function
; BEGIN: Print new line
mov rdx, 1 ; arg3: length of a string to print
mov rsi, newLine ; arg2: pointer to a string
mov rdi, 1 ; arg1: where to write, so called `file descriptor`
; in this case stdout (screen)
mov rax, 1 ; System call number (sys_write)
syscall ; Call a system function
; END: Print new line
; Restore all registers
pop rdi
pop rsi
pop rdx
pop rcx
pop rbx
pop rax
jmp [rsp] ; Jump to the address saved at the top of the stack (where to return)
; just before before print call (but after all arguments needed
; by routine)
; END: Print number code
_start:
push qword [number1] ; Push into the stack 1st argument: the number to be printed
push qword cont1 ; Push into the stack where to return from routine
jmp printNumber ; "Call" print routine
add rsp, 8 ; Clear the stack - "take out" first element from the stack
cont1:
push qword [number2] ; Push into the stack 2st argument: the number to be printed
push qword cont2 ; Push into the stack where to return from routine
jmp printNumber ; "Call" print routine
add rsp, 8 ; Clear the stack - "take out" first element from the stack
cont2:
; BEGIN: Exit
mov rdi, 0 ; Exit code, 0=normal
mov rax, 60 ; System call number (sys_exit)
syscall ; Call a system function
; End of the code
Much of the code stays untouched. The changes concerns:
; Save registers as they are before routine execution
push rax
push rbx
push rcx
push rdx
push rsi
push rdi
; Put data to print taken from the stack
; into EDX:EAX
; For simplicity I assume EDX is always equal to 0
mov edx, 0 ; Set EDX to default value
mov eax, [rsp+56] ; Set EAX to the number you want to print;
; this number is on the stack
; Restore all registers
pop rdi
pop rsi
pop rdx
pop rcx
pop rbx
pop rax
jmp [rsp] ; Jump to the address saved at the top of the stack (where to return)
; just before before print call (but after all arguments needed
; by routine)
_start:
push qword [number1] ; Push into the stack 1st argument: the number to be printed
push qword cont1 ; Push into the stack where to return from routine
jmp printNumber ; "Call" print routine
add rsp, 8 ; Clear the stack - "take out" first element from the stack
cont1:
push qword [number2] ; Push into the stack 1st argument: the number to be printed
push qword cont2 ; Push into the stack where to return from routine
jmp printNumber ; "Call" print routine
add rsp, 8 ; Clear the stack - "take out" first element from the stack
cont2:
; Save registers as they are before routine execution
push rax
push rbx
push rcx
push rdx
push rsi
push rdi
; Put data to print taken from the stack
; into EDX:EAX
; For simplicity I assume EDX is always equal to 0
mov edx, 0 ; Set EDX to default value
mov eax, [rsp+56] ; Set EAX to the number you want to print;
; this number is on the stack
The problem with this code is that offset to the first argument which is equal to 56
depends on the "local" values you put into the stack just after routine call (sequence of push-es to save registers). If you change them (add next push or remove some of the existing) you have to remember to modify number 56
to correct value. This is not bad, but you have to remember about this. However you do not have to, because location of arguments does not depends on any local activities on the stack (assuming you not destroy stack contents).
RSP
which constantly changes but instead RBP
which points to the bootom of the stack. For this reason in most cases every function begins with the well known sequence of instructions called function prologue:
; Classic function prologue
push rbp
mov rbp, rsp
What this does is to “save” the current position of the base pointer (the bottom of the “current” stack frame) with push rbp
and replace it with the stack pointer (the tip/top of the stack) with mov rbp, rsp
. So the new base pointer is the current top of the stack.
address old BP xxxx
address- 8 new BP+[2*8=16] DATA SP+[7*8=64]
address-16 new BP+[1*8= 8] WTR SP+[6*8=56]
address-24 new BP RBP SP+[6*8=48] <- save "old" base pointer; from now (new) base pointer = (current) stack pointer
address-32 new BP-[1*8= 8] RAX SP+[5*8=40]
address-40 new BP-[2*8=16] RBX SP+[4*8=32]
address-48 new BP-[3*8=24] RCX SP+[3*8=24]
address-56 new BP-[4*8=32] RDX SP+[2*8=16]
address-64 new BP-[5*8=40] RSI SP+[1*8= 8]
address-72 new BP-[6*8=48] RDI SP
Consequently, if you have function prologue, you should have function epilogue to revert "old" base pointer:
; Classic function epilogue
pop rbp
; Return from routine
With prologue and epilogue you have to modify only one instruction and replace:
mov eax, [rsp+56] ; Set EAX to the number you want to print;
with
mov eax, [rbp+16] ; Set EAX to the number you want to print;
With this change you always have an access to the first argument with rbp+16
.
section .data ; Data section
transTab: db "0123456789" ; Translation table
newLine: db 10 ; Code for printing a new line
number1: dd 12345
number2: dd 67890
section .bss ; Block Starting Symbol section
; It contains uninitialized data.
result: resb 16 ; Reserve space for result.
; Max 16 digit
section .text
global _start
; BEGIN: Print number code
; Init
printNumber:
; Classic function prologue
push rbp
mov rbp, rsp
; Save registers as they are before routine execution
push rax
push rbx
push rcx
push rdx
push rsi
push rdi
; Put data to print taken from the stack
; into EDX:EAX
; For simplicity I assume EDX is always equal to 0
mov edx, 0 ; Set EDX to default value
mov eax, [rbp+16] ; Set EAX to the number you want to print;
; this number is on the stack
xor rbx, rbx ; Clear RBX register (set it to 0)
mov ebx, result+15 ; Set EBX part of RBX to point to the end of the buffer
; BEGIN: Prepare data
printLoop:
mov ecx, 10
div ecx ; Div EDX:EAX by ECX
; EAX = quotient (an integer part)
; EDX = remainder
mov ecx, [transTab + edx] ; Copy ASCII value corresponding to reminder to ECX
mov [ebx], cl ; Copy CL part of ECX (1 byte instead of 4 bytes) to 'result' buffer
dec ebx ; Move to the previous byte in the buffer
mov edx, 0 ; Restore default EDX value
cmp eax, 0 ; Compare EAX with immediate value: 0
jne printLoop ; Jump if operands of previous CMP instruction
; are not equal - keep looping until EAX
; is zero which means that all digits are
; converted. When done go to the print part
; BEGIN: Print result buffer
print:
; Calculate length of a string to print
xor rax, rax ; Set RAX to be equal to 0 NEW
mov eax, result+16 ; Prepare `DST` argument for SUB (DST := DST – SRC) NEW
sub eax, ebx ; Get the length
mov rdx, rax ; arg3: length of a string to print
xor rsi, rsi ; Clear RSI register (set it to 0) NEW; see explanation below
mov esi, ebx ; arg2: pointer to a string
mov rdi, 1 ; arg1: where to write, so called `file descriptor`
; in this case stdout (screen)
mov rax, 1 ; System call number (sys_write)
syscall ; Call a system function
; BEGIN: Print new line
mov rdx, 1 ; arg3: length of a string to print
mov rsi, newLine ; arg2: pointer to a string
mov rdi, 1 ; arg1: where to write, so called `file descriptor`
; in this case stdout (screen)
mov rax, 1 ; System call number (sys_write)
syscall ; Call a system function
; END: Print new line
; Restore all registers
pop rdi
pop rsi
pop rdx
pop rcx
pop rbx
pop rax
; Classic function epilogue
pop rbp
; Return from routine
jmp [rsp] ; Jump to the address saved at the top of the stack (where to return)
; just before before print call (but after all arguments needed
; by routine)
; END: Print number code
_start:
push qword [number1] ; Push into the stack 1st argument: the number to be printed
push qword cont1 ; Push into the stack where to return from routine
jmp printNumber ; "Call" print routine
add rsp, 8 ; Clear the stack - "take out" first element from the stack
cont1:
push qword [number2] ; Push into the stack 1st argument: the number to be printed
push qword cont2 ; Push into the stack where to return from routine
jmp printNumber ; "Call" print routine
add rsp, 8 ; Clear the stack - "take out" first element from the stack
cont2:
; BEGIN: Exit
mov rdi, 0 ; Exit code, 0=normal
mov rax, 60 ; System call number (sys_exit)
syscall ; Call a system function
; End of the code
routine_print.asm
section .data
transTab: db "0123456789" ; Translation table
newLine: db 10 ; Code for printing a new line
sys_exit equ 60
sys_write equ 1
stdout equ 1
section .bss ; Block Starting Symbol section
; It contains uninitialized data.
result: resb 16 ; Reserve space for result.
; Max 16 digit
section .text
global print_number_32
global exit
; BEGIN: Print number code
; Init
print_number_32:
; Classic function prologue
push rbp
mov rbp, rsp
; Save registers as they are before routine execution
push rax
push rbx
push rcx
push rdx
push rsi
push rdi
; Put data to print taken from the stack
; into EDX:EAX
; For simplicity I assume EDX is always equal to 0
mov edx, 0 ; Set EDX to default value
mov eax, [rbp+16] ; Set EAX to the number you want to print;
; this number is on the stack
xor rbx, rbx ; Clear RBX register (set it to 0)
mov ebx, result+15 ; Set EBX part of RBX to point to the end of the buffer
; BEGIN: Prepare data
printLoop:
mov ecx, 10
div ecx ; Div EDX:EAX by ECX
; EAX = quotient (an integer part)
; EDX = remainder
mov ecx, [transTab + edx] ; Copy ASCII value corresponding to reminder to ECX
mov [ebx], cl ; Copy CL part of ECX (1 byte instead of 4 bytes) to 'result' buffer
dec ebx ; Move to the previous byte in the buffer
mov edx, 0 ; Restore default EDX value
cmp eax, 0 ; Compare EAX with immediate value: 0
jne printLoop ; Jump if operands of previous CMP instruction
; are not equal - keep looping until EAX
; is zero which means that all digits are
; converted. When done go to the print part
; BEGIN: Print result buffer
print:
; Calculate length of a string to print
xor rax, rax ; Set RAX to be equal to 0 NEW
mov eax, result+16 ; Prepare `DST` argument for SUB (DST := DST – SRC) NEW
sub eax, ebx ; Get the length
mov rdx, rax ; arg3: length of a string to print
xor rsi, rsi ; Clear RSI register (set it to 0) NEW; see explanation below
mov esi, ebx ; arg2: pointer to a string
mov rdi, 1 ; arg1: where to write, so called `file descriptor`
; in this case stdout (screen)
mov rax, 1 ; System call number (sys_write)
syscall ; Call a system function
; BEGIN: Print new line
mov rdx, 1 ; arg3: length of a string to print
mov rsi, newLine ; arg2: pointer to a string
mov rdi, 1 ; arg1: where to write, so called `file descriptor`
; in this case stdout (screen)
mov rax, 1 ; System call number (sys_write)
syscall ; Call a system function
; END: Print new line
; Restore all registers
pop rdi
pop rsi
pop rdx
pop rcx
pop rbx
pop rax
; Classic function epilogue
pop rbp
; Return from routine
jmp [rsp] ; Jump to the address saved at the top of the stack (where to return)
; just before before print call (but after all arguments needed
; by routine)
; END: Print number code
; BEGIN: Exit
exit:
mov rdi, 0 ; Exit code, 0=normal
mov rax, 60 ; System call number (sys_exit)
syscall ; Call a system function
; END: Exit
main.asm
section .data ; Data section
number1: dd 12345
number2: dd 67890
section .text
extern print_number_32
extern exit
global _start
_start:
push qword [number1] ; Push into the stack 1st argument: the number to be printed
push qword cont1 ; Push into the stack where to return from routine
jmp print_number_32 ; "Call" print routine
add rsp, 8 ; Clear the stack - "take out" first element from the stack
cont1:
push qword [number2] ; Push into the stack 1st argument: the number to be printed
push qword cont2 ; Push into the stack where to return from routine
jmp print_number_32 ; "Call" print routine
add rsp, 8 ; Clear the stack - "take out" first element from the stack
cont2:
jmp exit
; End of the code
Compilation and execution result:
fulmanp@fulmanp-ThinkPad-T540p:~/Desktop/assembler/03_second_program$ nasm -f elf64 routine_print.asm -o routine_print.o
fulmanp@fulmanp-ThinkPad-T540p:~/Desktop/assembler/03_second_program$ nasm -f elf64 main.asm -o main.o
fulmanp@fulmanp-ThinkPad-T540p:~/Desktop/assembler/03_second_program$ ld main.o routine_print.o -o print_test
fulmanp@fulmanp-ThinkPad-T540p:~/Desktop/assembler/03_second_program$ ./print_test
12345
67890
CALL
and RET
instruction. In main.asm
replace:
_start:
push qword [number1] ; Push into the stack 1st argument: the number to be printed
push qword cont1 ; Push into the stack where to return from routine
jmp print_number_32 ; "Call" print routine
add rsp, 8 ; Clear the stack - "take out" first element from the stack
cont1:
push qword [number2] ; Push into the stack 1st argument: the number to be printed
push qword cont2 ; Push into the stack where to return from routine
jmp print_number_32 ; "Call" print routine
add rsp, 8 ; Clear the stack - "take out" first element from the stack
cont2:
jmp exit
; End of the code
with:
_start:
push qword [number1] ; Push into the stack 1st argument: the number to be printed
call print_number_32 ; "Call" print routine
add rsp, 8 ; Clear the stack - "take out" first element from the stack
push qword [number2] ; Push into the stack 1st argument: the number to be printed
call print_number_32 ; "Call" print routine
add rsp, 8 ; Clear the stack - "take out" first element from the stack
call exit
; End of the code
and in routine_print.asm
replace:
jmp [rsp]
with:
ret
Sometimes, mostly when you use online compilers, you have to keep all the code in one file. For this case you can download final version of print
routine in a single file.