Initial version: 2025-02-25
Last update: 2025-03-02
and RET
General registers
|______AX______| Accumulator
|______BX______| Base
|______CX______| Count
|______DX______| Data
Pointers and index
|______SP______| Stack Pointer
|______BP______| Base Pointer
|______SI______| Source Index
|______DI______| Destination Index
|______CS______| Code
|______DS______| Data
|______SS______| Stack
|______ES______| Extract
Program status
When the architecture was extended to 32 bits, Intel decided to preserve backward compatibility, so the general registers layout stayed the same, but with "extended" higher part above 16 bits (probably this explains letter E
prefixes "old" names):
80386, Pentium
General registers
? = A, B, C or D
Pointers and index
Program status
Similar approach was applied in case of transition to 64-bit architecture.
Intel, AMD 64-bit
| || || || || || || || |
General purpose registers: EAX, EBX, ECX, EDX, ESI, EDI
EBP -- Base Pointer
ESP -- Stack pointer
x64 extends x64's 8 general-purpose registers to be 64-bit, and adds 8 new 64-bit registers. The 64-bit registers have names beginning with R
, so for example the 64-bit extension of EAX
is called RAX
. The new registers are named R8
through R15
. Why R
? Some people say R
is from really extended which is nice explanation, but probably R
comes simply from register.
, whose lower 8 bits were not previously addressable. The following table specifies the assembly-language names for the lower portions of 64-bit registers:
64-bit register | Lower 32 bits | Lower 16 bits | Lower 8 bits
rax | eax | ax | al
rbx | ebx | bx | bl
rcx | ecx | cx | cl
rdx | edx | dx | dl
rsi | esi | si | sil
rdi | edi | di | dil
rbp | ebp | bp | bpl
rsp | esp | sp | spl
r8 | r8d | r8w | r8b
r9 | r9d | r9w | r9b
r10 | r10d | r10w | r10b
r11 | r11d | r11w | r11b
r12 | r12d | r12w | r12b
r13 | r13d | r13w | r13b
r14 | r14d | r14w | r14b
r15 | r15d | r15w | r15b
Sometimes rXb is called rXl, l - lower.
def int_div(dividend, divisor):
integer_part = (int)(dividend / divisor)
reminder = (int)(dividend % divisor)
return (integer_part, reminder)
def print_number(number):
n = number
while n > 0:
integer_part, reminder = int_div(n, 10)
n = integer_part
digit = reminder
print(digit, end='')
number = 12345
The int_div(dividend, divisor)
is a helper function performing integer division of two integers. For example int_div(27, 10)
returns tuple (2, 7)
As you can see this algorithm has some drawback – it prints digits "in reverse order" where the least significant digit is printed as the most left digit. You can fix it easily introducing buffer:
def print_number(number):
n = number
buffer = ""
while n > 0:
integer_part, reminder = int_div(n, 10)
n = integer_part
digit = reminder
buffer = (str)(digit) + buffer
Now the result is correct:
(unsigned integer divide) divides unsigned integer value in the AX
, or RDX:RAX
registers (dividend) by the source operand (divisor) and stores the result in the AX
), DX:AX
, or RDX:RAX
registers. The source operand can be a general purpose register or a memory location. The action of this instruction depends on the operand size (dividend/divisor). Division using 64-bit operand is available only in 64-bit mode. This instruction has the following formats:
Operand Size Dividend Divisor Quotient Remainder Maximum Quotient
Word/byte AX r/m8 AL AH 255
Doubleword/word DX:AX r/m16 AX DX 65,535
doubleword EDX:EAX r/m32 EAX EDX 2^32 - 1
quadword RDX:RAX r/m64 RAX RDX 2^64 - 1
For example consider the following division:
AX = 10111011 01111110 = 47998_10
BL = 11000010 = 192_10
AX / BL -> AL = 249_10 (11111001), AH = 190_10 (10111110)
AX = 10111110 11111001
However the life is not always so easy. There can be situation when quotient (integer part of the result) will not fit into designated register:
AX = 11111111 11111111 = 65535_10
BL = 00000010 = 2_10
AX / BL -> 32767_10 + 1_10
32767_10 = 01111111 11111111 > 11111111 = FFh
In case when result does not fit into register the #DE
(Division Error) exception is raised.
alone it wouldn't be possible to implement anything, so you need some other instructions. These instructions are: CMP
and XOR
. let's take a look at them one by one.
instructionCMP first, second
compares the first
source operand with the second
source operand and sets the status flags in the EFLAGS
register according to the results. The comparison is performed by subtracting the second operand from the first operand (tmp = first - second
) and then setting the status flags in the same manner as the SUB
instruction: it changes the values of ZF
and CF
flags; see examples below to get know how it works.
Result: ZF = 0
and CF = 1
Result: ZF = 0
and CF = 0
Result: ZF = 1
and CF = 0
instruction is typically used in conjunction with a conditional jump from Jcc
family, condition move (CMOVcc
family), or SETcc
instructionINC what
adds 1 to the destination operand what
, while preserving the state of the CF
flag. The destination operand can be a register or a memory location. This instruction allows a loop counter to be updated without disturbing the CF
prefix to allow the following instruction to be executed atomically:
inc ebx
instructionJMP where
– transfers program control to a different point in the instruction stream without recording return information. The destination (target) operand specifies the address of the instruction being jumped to. This operand can be an immediate value, a general-purpose register, or a memory location.
instructionJNE where
– Checks the state of one or more of the status flags in the EFLAGS register (CF
, OF
, PF
, SF
, and ZF
) and, if the flags are in the specified state (condition), performs a jump to the target instruction specified by the destination operand. A condition code (cc) is associated with each instruction to indicate the condition being tested for. If the condition is not satisfied, the jump is not performed and execution continues with the instruction following the Jcc
, and JCXZ
instructions differ from other Jcc
instructions because they do not check status flags. Instead, they check RCX
or CX
for 0.
instructionMOV dst, src
copies the second operand (src
– source operand) to the first operand (dst
– destination operand). The source operand can be an immediate value, general-purpose register, segment register, or memory location; the destination register can be a general-purpose register, segment register, or memory location. Both operands must be the same size, which can be a byte
, a word
, a doubleword
, or a quadword
instructionSUB dst, src
subtracts the second operand (src
– source operand) from the first operand (dst
– destination operand) and stores the result in the destination operand:
dst := dst - src
The destination operand can be a register or a memory location; the source operand can be an immediate, register, or memory location (however, two memory operands cannot be used in one instruction). When an immediate value is used as an operand, it is sign-extended to the length of the destination operand format.
instructionXOR dst, src
performs a bitwise exclusive OR (XOR) operation on the destination (first) and source (second) operands and stores the result in the destination operand location:
dst := dst XOR src
The source operand can be an immediate, a register, or a memory location; the destination operand can be a register or a memory location (however, two memory operands cannot be used in one instruction). Each bit of the result is 1 if the corresponding bits of the operands are different; each bit is 0 if the corresponding bits are the same.
section .data ; Data section
transTab: db "0123456789" ; Translation table
section .bss ; Block Starting Symbol section
; It contains uninitialized data.
result: resb 16 ; Reserve space for result.
; Max 16 digit
section .text
global _start
; Put data to print into
; For simplicity I assume EDX is always equal to 0
mov edx, 0 ; Set EDX to default value
mov eax, 12345 ; Set EAX to the number you want to print
jmp printNumber ; Let's print
; Print number code: begin
; Init
xor rbx, rbx ; Clear RBX register (set it to 0)
mov ebx, result ; Set EBX part of RBX to point to the beginning of the buffer
; BEGIN: Prepare data
mov ecx, 10
div ecx ; Div EDX:EAX by ECX
; EAX = quotient (an integer part)
; EDX = remainder
mov ecx, [transTab + edx] ; Copy ASCII value corresponding to reminder to ECX
mov [ebx], cl ; Copy CL part of ECX (1 byte instead of 4 bytes) to 'result' buffer
inc ebx ; Move to the next byte in the buffer
mov edx, 0 ; Restore default EDX value
cmp eax, 0 ; Compare EAX with immediate value: 0
jne printLoop ; Jump if operands of previous CMP instruction
; are not equal - keep looping until EAX
; is zero which means that all digits are
; converted. When done go to the print part
; BEGIN: Print result buffer
sub ebx, result ; Calculate length of a string to print
mov rdx, rbx ; arg3: length of a string to print
mov rsi, result ; arg2: pointer to a string
mov rdi, 1 ; arg1: where to write, so called `file descriptor`
; in this case stdout (screen)
mov rax, 1 ; System call number (sys_write)
syscall ; Call a system function
; BEGIN: Exit
mov rdi, 0 ; Exit code, 0=normal
mov rax, 60 ; System call number (sys_exit)
syscall ; Call a system function
; End of the code
I hope the code with my comments is clear but I want to comment one instruction:
mov [ebx], cl ; Copy CL part of ECX (1 byte instead of 4 bytes) to 'result' buffer
If you look into two preceding instructions:
div ecx ; Div EDX:EAX by ECX
; EAX = quotient (an integer part)
; EDX = remainder
mov ecx, [transTab + edx] ; Copy ASCII value corresponding to reminder to ECX
you see that EDX
contains reminder of division. Because you divide by 10, the reminder is in range from 0 to 9. Notice that when the lowest digit is 5, the reminder is 5, when the lowest digit is 7, the reminder is 7, etc. So the reminder is a position in transTab
of a corresponding digit (this is exactly what is calculated in [transTab + edx]
where EDX
is a position in table transTab
or more properly EDX
is an offset from the address transTab
which is the beginning of the translation table.
is a 32-bit register, so you transfer from memory 32 bits starting at address transTab + edx
div(13,10) -> reminder=3
beginning of transTab
| transTab+reminder
| |
|||4th byte to transfer
||3rd byte to transfer
|2nd byte to transfer
1st byte to transfer
You need only the first byte but all four bytes
will be transferred.
In consequence executing mov ecx, [transTab + edx]
you transfer 4 bytes into ECX
mov [ebx], ecx
you will copy 4 bytes to the buffer starting at address given in EBX
. This will get you into trouble when you will be close to the end of the buffer. For example, if you will be at the last possible address, then above instruction will put 1st byte from translation address at last address but then 2nd byte at last+1 address, 3rd byte at last+2 address and finally 4th byte at last+3 address. As you san see this way you will exceed your buffer by 3 bytes and possibly destroy other data.
section .data ; Data section
transTab: db "0123456789" ; Translation table
section .bss ; Block Starting Symbol section
; It contains uninitialized data.
result: resb 16 ; Reserve space for result.
; Max 16 digit
section .text
global _start
; Put data to print into
; For simplicity I assume EDX is always equal to 0
mov edx, 0 ; Set EDX to default value
mov eax, 12345 ; Set EAX to the number you want to print
jmp printNumber ; Let's print
; BEGIN: Print number code
; Init
xor rbx, rbx ; Clear RBX register (set it to 0)
; mov ebx, result ; Set EBX to point to the beginning of the buffer
mov ebx, result+15 ; Set EBX part of RBX to point to the end of the buffer UPDATED
; BEGIN: Prepare data
mov ecx, 10
div ecx ; Div EDX:EAX by ECX
; EAX = quotient (an integer part)
; EDX = remainder
mov ecx, [transTab + edx] ; Copy ASCII value corresponding to reminder to ECX
mov [ebx], cl ; Copy CL part of ECX (1 byte instead of 4 bytes) to 'result' buffer
; inc ebx ; Move to the next byte in the buffer
dec ebx ; Move to the previous byte in the buffer UPDATED
mov edx, 0 ; Restore default EDX value
cmp eax, 0 ; Compare EAX with immediate value: 0
jne printLoop ; Jump if operands of previous CMP instruction
; are not equal - keep looping until EAX
; is zero which means that all digits are
; converted. When done go to the print part
; BEGIN: Print result buffer
; sub ebx, result ; Calculate length of a string to print
; Calculate length of a string to print UPDATED
xor rax, rax ; Set RAX to be equal to 0 NEW
mov eax, result+16 ; Prepare `DST` argument for SUB (DST := DST – SRC) NEW
sub eax, ebx ; Get the length UPDATED
mov rdx, rax ; arg3: length of a string to print
; mov rsi, result ; arg2: pointer to a string
xor rsi, rsi ; Clear RSI register (set it to 0) NEW; see explanation below
mov esi, ebx ; arg2: pointer to a string UPDATED
mov rdi, 1 ; arg1: where to write, so called `file descriptor`
; in this case stdout (screen)
mov rax, 1 ; System call number (sys_write)
syscall ; Call a system function
; END: Print number code
; BEGIN: Exit
mov rdi, 0 ; Exit code, 0=normal
mov rax, 60 ; System call number (sys_exit)
syscall ; Call a system function
; End of the code
The code is not much different than previous version, however one part needs my explanations.
mov rsi, result ; arg2: pointer to a string
Note that RSI
is a 64-bit register while result
is a numeric constant. This constant is encoded as immediate value on 64-bit to fit smoothly into 64-bit register.
register. Because EBX
is a 32-bit register it fits into ESI
which is a lower part of 64-bit RSI
register. If you do simply:
mov esi, ebx
the lower part of RSI
would be equal to EBX
. What about higher part of RSI
? Nobody knows. It should be equal to 0 in order the whole RSI
to be equal to a pointer to a string (which is EBX
(set to 0) with XOR
instruction and then you can put safely into a lower part of RSI
content of EBX
so for sure RSI
would be equal to EBX
; hence the sequence of instructions:
xor rsi, rsi ; Clear RSI register (set it to 0) NEW; see explanation below
mov esi, ebx ; arg2: pointer to a string UPDATED
section .data ; Data section
global _start
mov dx, 0 ; dividend - higher half
mov ax, 16 ; dividend - lower half
mov cx, 5 ; divisor
div cx ; div dx:ax by cx
; Exit
; Use exit code to get result
mov rdi, rax ; Quotient
; or
;mov rdi, rdx ; Remainder
mov rax, 60 ; System call number (sys_exit)
syscall ; Call a system function
; End of the code
As you can see you divide 16
by 5
and just right after this you finish execution calling sys_exit
system function. Making this call in previous examples you wrote:
mov rdi, 0 ; Exit code, 0=normal
There is nothing against returning anything different than 0
– it does not influence execution of your program, your operating system or anything else. It is just an information for a caller who can interpret the number to decide if execution was successful or not. Typically 0 means normal end of the execution, but it is only a convention.
to return either quotient or remainder:
[... uncomment quotient, comment remainder ...]
fulmanp@fulmanp-k2:~/assembler$ nasm -f elf64 inst_64_div.asm
fulmanp@fulmanp-k2:~/assembler$ ld inst_64_div.o -o inst_64_div
fulmanp@fulmanp-k2:~/assembler$ ./inst_64_div
fulmanp@fulmanp-k2:~/assembler$ echo $?
[... comment quotient, uncomment remainder ...]
fulmanp@fulmanp-k2:~/assembler$ nasm -f elf64 inst_64_div.asm
fulmanp@fulmanp-k2:~/assembler$ ld inst_64_div.o -o inst_64_div
fulmanp@fulmanp-k2:~/assembler$ ./inst_64_div
fulmanp@fulmanp-k2:~/assembler$ echo $?
The sequence $?
is a special variable in BASH that always holds the return/exit code of the last executed command. You can view it in a terminal by running echo $?
ends execution of your program.
is the label of memory location designated to keep number to be printed (ntp
– number to print). You need also another one "fixed memory location designated to preserve address where to return from printing routine and continue execution – this would be a wtr
label (where to return).
section .data ; Data section
transTab: db "0123456789" ; Translation table
newLine: db 10 ; Code for printing a new line
ntp: dd 0 ; Number to print
; Here you have to move every number
; you want to print
wtr: dq 0 ; Where to return from print call
number1: dd 12345
number2: dd 67890
section .bss ; Block Starting Symbol section
; It contains uninitialized data.
result: resb 16 ; Reserve space for result.
; Max 16 digit
section .text
global _start
; BEGIN: Print number code
; Init
xor rbx, rbx ; Clear RBX register (set it to 0)
mov ebx, result+15 ; Set EBX part of RBX to point to the end of the buffer
; BEGIN: Prepare data
mov ecx, 10
div ecx ; Div EDX:EAX by ECX
; EAX = quotient (an integer part)
; EDX = remainder
mov ecx, [transTab + edx] ; Copy ASCII value corresponding to reminder to ECX
mov [ebx], cl ; Copy CL part of ECX (1 byte instead of 4 bytes) to 'result' buffer
dec ebx ; Move to the previous byte in the buffer
mov edx, 0 ; Restore default EDX value
cmp eax, 0 ; Compare EAX with immediate value: 0
jne printLoop ; Jump if operands of previous CMP instruction
; are not equal - keep looping until EAX
; is zero which means that all digits are
; converted. When done go to the print part
; BEGIN: Print result buffer
; Calculate length of a string to print
xor rax, rax ; Set RAX to be equal to 0 NEW
mov eax, result+16 ; Prepare `DST` argument for SUB (DST := DST – SRC) NEW
sub eax, ebx ; Get the length
mov rdx, rax ; arg3: length of a string to print
xor rsi, rsi ; Clear RSI register (set it to 0) NEW; see explanation below
mov esi, ebx ; arg2: pointer to a string
mov rdi, 1 ; arg1: where to write, so called `file descriptor`
; in this case stdout (screen)
mov rax, 1 ; System call number (sys_write)
syscall ; Call a system function
; BEGIN: Print new line
mov rdx, 1 ; arg3: length of a string to print
mov rsi, newLine ; arg2: pointer to a string
mov rdi, 1 ; arg1: where to write, so called `file descriptor`
; in this case stdout (screen)
mov rax, 1 ; System call number (sys_write)
syscall ; Call a system function
; END: Print new line
jmp [wtr] ; Jump to the address saved in wtr (where to return)
; before print call
; END: Print number code
; Put data `number1` to print into
; For simplicity I assume EDX is always equal to 0
mov edx, 0 ; Set EDX to default value
mov eax, [number1] ; Set EAX to the number you want to print
mov qword [wtr], cont1 ; Set return address from print routine
jmp printNumber ; "Call" print routine
; Put data `number2` to print into
; For simplicity I assume EDX is always equal to 0
mov edx, 0 ; Set EDX to default value
mov eax, [number2] ; Set EAX to the number you want to print
mov qword [wtr], cont2 ; Set return address from print routine
jmp printNumber ; "Call" print routine
; BEGIN: Exit
mov rdi, 0 ; Exit code, 0=normal
mov rax, 60 ; System call number (sys_exit)
syscall ; Call a system function
; End of the code
so you need a space to store all of them. You also need a method to pass a return address to the routine. In the code you have now this is for what you use wtr
. However it would be nice if you could use some other, well known address space without using explicite names like wtr
. A good idea is to use a stack with its POP
and PUSH
instructions. The general idea of the stack was explained in XXX, so please look there for clarification if you do not remember. With the stack the above tree-steps algorithm will be as follow:
exactly in that order.
BP - base pointer
SP - stack pointer
address BP xxxx
address- 8 BP-[1*8= 8] DATA SP+[7*8=56]
address-16 BP-[2*8=16] WTR SP+[6*8=48]
address-24 BP-[3*8=24] RAX SP+[5*8=40]
address-32 BP-[4*8=32] RBX SP+[4*8=32]
address-40 BP-[5*8=40] RCX SP+[3*8=24]
address-48 BP-[6*8=48] RDX SP+[2*8=16]
address-56 BP-[7*8=56] RSI SP+[1*8= 8]
address-65 BP-[8*8=64] RDI SP
In the stack given as above an access to the RCX
register, for example, is possible relatively to the top of the stack given by RSP
value: RSP+24
address |some data | <-- BP, SP Bottom of the stack (BP). It is also tip of the stack ("current" position of the SP).
address- 1| |
address- 2| |
When you push first data into the stack, say it would be DATA
, the stack will contain:
address |xxxxxxxxxxx| <-- BP bottom of the stack. Address of the first byte you can save on the stack
address- 1|DATA byte 8|
address- 2|DATA byte 7|
address- 3|DATA byte 6|
address- 4|DATA byte 5|
address- 5|DATA byte 4|
address- 6|DATA byte 3|
address- 7|DATA byte 2|
address- 8|DATA byte 1| <-- BP-8 offset by 8 bytes (64 bits) from the bottom of the stack.
This is also a "current" position of the SP
When you push next data into the stack, say it would be WTR
, it will contain:
address |xxxxxxxxxxx| <-- BP bottom of the stack. Address of the first byte you can save on the stack
address- 1|DATA byte 8|
address- 2|DATA byte 7|
address- 3|DATA byte 6|
address- 4|DATA byte 5|
address- 5|DATA byte 4|
address- 6|DATA byte 3|
address- 7|DATA byte 2|
address- 8|DATA byte 1| <-- BP-8 == SP+8
address- 9|WTR byte 8|
address-10|WTR byte 7|
address-11|WTR byte 6|
address-12|WTR byte 5|
address-13|WTR byte 4|
address-14|WTR byte 3|
address-15|WTR byte 2|
address-16|WTR byte 1| <-- BP-16 offset by 16 bytes (2 x 64 bits) from the bottom of the stack.
This is also a "current" position of the SP
From the above "zoom" you can see that to get byte 1
of the DATA
you can either use BP-8
address or SP+8
section .data ; Data section
transTab: db "0123456789" ; Translation table
newLine: db 10 ; Code for printing a new line
number1: dd 12345
number2: dd 67890
section .bss ; Block Starting Symbol section
; It contains uninitialized data.
result: resb 16 ; Reserve space for result.
; Max 16 digit
section .text
global _start
; BEGIN: Print number code
; Init
; Save registers as they are before routine execution
push rax
push rbx
push rcx
push rdx
push rsi
push rdi
; Put data to print taken from the stack
; into EDX:EAX
; For simplicity I assume EDX is always equal to 0
mov edx, 0 ; Set EDX to default value
mov eax, [rsp+56] ; Set EAX to the number you want to print;
; this number is on the stack
xor rbx, rbx ; Clear RBX register (set it to 0)
mov ebx, result+15 ; Set EBX part of RBX to point to the end of the buffer
; BEGIN: Prepare data
mov ecx, 10
div ecx ; Div EDX:EAX by ECX
; EAX = quotient (an integer part)
; EDX = remainder
mov ecx, [transTab + edx] ; Copy ASCII value corresponding to reminder to ECX
mov [ebx], cl ; Copy CL part of ECX (1 byte instead of 4 bytes) to 'result' buffer
dec ebx ; Move to the previous byte in the buffer
mov edx, 0 ; Restore default EDX value
cmp eax, 0 ; Compare EAX with immediate value: 0
jne printLoop ; Jump if operands of previous CMP instruction
; are not equal - keep looping until EAX
; is zero which means that all digits are
; converted. When done go to the print part
; BEGIN: Print result buffer
; Calculate length of a string to print
xor rax, rax ; Set RAX to be equal to 0 NEW
mov eax, result+16 ; Prepare `DST` argument for SUB (DST := DST – SRC) NEW
sub eax, ebx ; Get the length
mov rdx, rax ; arg3: length of a string to print
xor rsi, rsi ; Clear RSI register (set it to 0) NEW; see explanation below
mov esi, ebx ; arg2: pointer to a string
mov rdi, 1 ; arg1: where to write, so called `file descriptor`
; in this case stdout (screen)
mov rax, 1 ; System call number (sys_write)
syscall ; Call a system function
; BEGIN: Print new line
mov rdx, 1 ; arg3: length of a string to print
mov rsi, newLine ; arg2: pointer to a string
mov rdi, 1 ; arg1: where to write, so called `file descriptor`
; in this case stdout (screen)
mov rax, 1 ; System call number (sys_write)
syscall ; Call a system function
; END: Print new line
; Restore all registers
pop rdi
pop rsi
pop rdx
pop rcx
pop rbx
pop rax
jmp [rsp] ; Jump to the address saved at the top of the stack (where to return)
; just before before print call (but after all arguments needed
; by routine)
; END: Print number code
push qword [number1] ; Push into the stack 1st argument: the number to be printed
push qword cont1 ; Push into the stack where to return from routine
jmp printNumber ; "Call" print routine
add rsp, 8 ; Clear the stack - "take out" first element from the stack
push qword [number2] ; Push into the stack 2st argument: the number to be printed
push qword cont2 ; Push into the stack where to return from routine
jmp printNumber ; "Call" print routine
add rsp, 8 ; Clear the stack - "take out" first element from the stack
; BEGIN: Exit
mov rdi, 0 ; Exit code, 0=normal
mov rax, 60 ; System call number (sys_exit)
syscall ; Call a system function
; End of the code
Much of the code stays untouched. The changes concerns:
; Save registers as they are before routine execution
push rax
push rbx
push rcx
push rdx
push rsi
push rdi
; Put data to print taken from the stack
; into EDX:EAX
; For simplicity I assume EDX is always equal to 0
mov edx, 0 ; Set EDX to default value
mov eax, [rsp+56] ; Set EAX to the number you want to print;
; this number is on the stack
; Restore all registers
pop rdi
pop rsi
pop rdx
pop rcx
pop rbx
pop rax
jmp [rsp] ; Jump to the address saved at the top of the stack (where to return)
; just before before print call (but after all arguments needed
; by routine)
push qword [number1] ; Push into the stack 1st argument: the number to be printed
push qword cont1 ; Push into the stack where to return from routine
jmp printNumber ; "Call" print routine
add rsp, 8 ; Clear the stack - "take out" first element from the stack
push qword [number2] ; Push into the stack 1st argument: the number to be printed
push qword cont2 ; Push into the stack where to return from routine
jmp printNumber ; "Call" print routine
add rsp, 8 ; Clear the stack - "take out" first element from the stack
; Save registers as they are before routine execution
push rax
push rbx
push rcx
push rdx
push rsi
push rdi
; Put data to print taken from the stack
; into EDX:EAX
; For simplicity I assume EDX is always equal to 0
mov edx, 0 ; Set EDX to default value
mov eax, [rsp+56] ; Set EAX to the number you want to print;
; this number is on the stack
The problem with this code is that offset to the first argument which is equal to 56
depends on the "local" values you put into the stack just after routine call (sequence of push-es to save registers). If you change them (add next push or remove some of the existing) you have to remember to modify number 56
to correct value. This is not bad, but you have to remember about this. However you do not have to, because location of arguments does not depends on any local activities on the stack (assuming you not destroy stack contents).
which constantly changes but instead RBP
which points to the bootom of the stack. For this reason in most cases every function begins with the well known sequence of instructions called function prologue:
; Classic function prologue
push rbp
mov rbp, rsp
What this does is to “save” the current position of the base pointer (the bottom of the “current” stack frame) with push rbp
and replace it with the stack pointer (the tip/top of the stack) with mov rbp, rsp
. So the new base pointer is the current top of the stack.
address old BP xxxx
address- 8 new BP+[2*8=16] DATA SP+[7*8=64]
address-16 new BP+[1*8= 8] WTR SP+[6*8=56]
address-24 new BP RBP SP+[6*8=48] <- save "old" base pointer; from now (new) base pointer = (current) stack pointer
address-32 new BP-[1*8= 8] RAX SP+[5*8=40]
address-40 new BP-[2*8=16] RBX SP+[4*8=32]
address-48 new BP-[3*8=24] RCX SP+[3*8=24]
address-56 new BP-[4*8=32] RDX SP+[2*8=16]
address-64 new BP-[5*8=40] RSI SP+[1*8= 8]
address-72 new BP-[6*8=48] RDI SP
Consequently, if you have function prologue, you should have function epilogue to revert "old" base pointer:
; Classic function epilogue
pop rbp
; Return from routine
With prologue and epilogue you have to modify only one instruction and replace:
mov eax, [rsp+56] ; Set EAX to the number you want to print;
mov eax, [rbp+16] ; Set EAX to the number you want to print;
With this change you always have an access to the first argument with rbp+16
section .data ; Data section
transTab: db "0123456789" ; Translation table
newLine: db 10 ; Code for printing a new line
number1: dd 12345
number2: dd 67890
section .bss ; Block Starting Symbol section
; It contains uninitialized data.
result: resb 16 ; Reserve space for result.
; Max 16 digit
section .text
global _start
; BEGIN: Print number code
; Init
; Classic function prologue
push rbp
mov rbp, rsp
; Save registers as they are before routine execution
push rax
push rbx
push rcx
push rdx
push rsi
push rdi
; Put data to print taken from the stack
; into EDX:EAX
; For simplicity I assume EDX is always equal to 0
mov edx, 0 ; Set EDX to default value
mov eax, [rbp+16] ; Set EAX to the number you want to print;
; this number is on the stack
xor rbx, rbx ; Clear RBX register (set it to 0)
mov ebx, result+15 ; Set EBX part of RBX to point to the end of the buffer
; BEGIN: Prepare data
mov ecx, 10
div ecx ; Div EDX:EAX by ECX
; EAX = quotient (an integer part)
; EDX = remainder
mov ecx, [transTab + edx] ; Copy ASCII value corresponding to reminder to ECX
mov [ebx], cl ; Copy CL part of ECX (1 byte instead of 4 bytes) to 'result' buffer
dec ebx ; Move to the previous byte in the buffer
mov edx, 0 ; Restore default EDX value
cmp eax, 0 ; Compare EAX with immediate value: 0
jne printLoop ; Jump if operands of previous CMP instruction
; are not equal - keep looping until EAX
; is zero which means that all digits are
; converted. When done go to the print part
; BEGIN: Print result buffer
; Calculate length of a string to print
xor rax, rax ; Set RAX to be equal to 0 NEW
mov eax, result+16 ; Prepare `DST` argument for SUB (DST := DST – SRC) NEW
sub eax, ebx ; Get the length
mov rdx, rax ; arg3: length of a string to print
xor rsi, rsi ; Clear RSI register (set it to 0) NEW; see explanation below
mov esi, ebx ; arg2: pointer to a string
mov rdi, 1 ; arg1: where to write, so called `file descriptor`
; in this case stdout (screen)
mov rax, 1 ; System call number (sys_write)
syscall ; Call a system function
; BEGIN: Print new line
mov rdx, 1 ; arg3: length of a string to print
mov rsi, newLine ; arg2: pointer to a string
mov rdi, 1 ; arg1: where to write, so called `file descriptor`
; in this case stdout (screen)
mov rax, 1 ; System call number (sys_write)
syscall ; Call a system function
; END: Print new line
; Restore all registers
pop rdi
pop rsi
pop rdx
pop rcx
pop rbx
pop rax
; Classic function epilogue
pop rbp
; Return from routine
jmp [rsp] ; Jump to the address saved at the top of the stack (where to return)
; just before before print call (but after all arguments needed
; by routine)
; END: Print number code
push qword [number1] ; Push into the stack 1st argument: the number to be printed
push qword cont1 ; Push into the stack where to return from routine
jmp printNumber ; "Call" print routine
add rsp, 8 ; Clear the stack - "take out" first element from the stack
push qword [number2] ; Push into the stack 1st argument: the number to be printed
push qword cont2 ; Push into the stack where to return from routine
jmp printNumber ; "Call" print routine
add rsp, 8 ; Clear the stack - "take out" first element from the stack
; BEGIN: Exit
mov rdi, 0 ; Exit code, 0=normal
mov rax, 60 ; System call number (sys_exit)
syscall ; Call a system function
; End of the code
section .data
transTab: db "0123456789" ; Translation table
newLine: db 10 ; Code for printing a new line
sys_exit equ 60
sys_write equ 1
stdout equ 1
section .bss ; Block Starting Symbol section
; It contains uninitialized data.
result: resb 16 ; Reserve space for result.
; Max 16 digit
section .text
global print_number_32
global exit
; BEGIN: Print number code
; Init
; Classic function prologue
push rbp
mov rbp, rsp
; Save registers as they are before routine execution
push rax
push rbx
push rcx
push rdx
push rsi
push rdi
; Put data to print taken from the stack
; into EDX:EAX
; For simplicity I assume EDX is always equal to 0
mov edx, 0 ; Set EDX to default value
mov eax, [rbp+16] ; Set EAX to the number you want to print;
; this number is on the stack
xor rbx, rbx ; Clear RBX register (set it to 0)
mov ebx, result+15 ; Set EBX part of RBX to point to the end of the buffer
; BEGIN: Prepare data
mov ecx, 10
div ecx ; Div EDX:EAX by ECX
; EAX = quotient (an integer part)
; EDX = remainder
mov ecx, [transTab + edx] ; Copy ASCII value corresponding to reminder to ECX
mov [ebx], cl ; Copy CL part of ECX (1 byte instead of 4 bytes) to 'result' buffer
dec ebx ; Move to the previous byte in the buffer
mov edx, 0 ; Restore default EDX value
cmp eax, 0 ; Compare EAX with immediate value: 0
jne printLoop ; Jump if operands of previous CMP instruction
; are not equal - keep looping until EAX
; is zero which means that all digits are
; converted. When done go to the print part
; BEGIN: Print result buffer
; Calculate length of a string to print
xor rax, rax ; Set RAX to be equal to 0 NEW
mov eax, result+16 ; Prepare `DST` argument for SUB (DST := DST – SRC) NEW
sub eax, ebx ; Get the length
mov rdx, rax ; arg3: length of a string to print
xor rsi, rsi ; Clear RSI register (set it to 0) NEW; see explanation below
mov esi, ebx ; arg2: pointer to a string
mov rdi, 1 ; arg1: where to write, so called `file descriptor`
; in this case stdout (screen)
mov rax, 1 ; System call number (sys_write)
syscall ; Call a system function
; BEGIN: Print new line
mov rdx, 1 ; arg3: length of a string to print
mov rsi, newLine ; arg2: pointer to a string
mov rdi, 1 ; arg1: where to write, so called `file descriptor`
; in this case stdout (screen)
mov rax, 1 ; System call number (sys_write)
syscall ; Call a system function
; END: Print new line
; Restore all registers
pop rdi
pop rsi
pop rdx
pop rcx
pop rbx
pop rax
; Classic function epilogue
pop rbp
; Return from routine
jmp [rsp] ; Jump to the address saved at the top of the stack (where to return)
; just before before print call (but after all arguments needed
; by routine)
; END: Print number code
; BEGIN: Exit
mov rdi, 0 ; Exit code, 0=normal
mov rax, 60 ; System call number (sys_exit)
syscall ; Call a system function
; END: Exit
section .data ; Data section
number1: dd 12345
number2: dd 67890
section .text
extern print_number_32
extern exit
global _start
push qword [number1] ; Push into the stack 1st argument: the number to be printed
push qword cont1 ; Push into the stack where to return from routine
jmp print_number_32 ; "Call" print routine
add rsp, 8 ; Clear the stack - "take out" first element from the stack
push qword [number2] ; Push into the stack 1st argument: the number to be printed
push qword cont2 ; Push into the stack where to return from routine
jmp print_number_32 ; "Call" print routine
add rsp, 8 ; Clear the stack - "take out" first element from the stack
jmp exit
; End of the code
Compilation and execution result:
fulmanp@fulmanp-ThinkPad-T540p:~/Desktop/assembler/03_second_program$ nasm -f elf64 routine_print.asm -o routine_print.o
fulmanp@fulmanp-ThinkPad-T540p:~/Desktop/assembler/03_second_program$ nasm -f elf64 main.asm -o main.o
fulmanp@fulmanp-ThinkPad-T540p:~/Desktop/assembler/03_second_program$ ld main.o routine_print.o -o print_test
fulmanp@fulmanp-ThinkPad-T540p:~/Desktop/assembler/03_second_program$ ./print_test
and RET
instruction. In main.asm
push qword [number1] ; Push into the stack 1st argument: the number to be printed
push qword cont1 ; Push into the stack where to return from routine
jmp print_number_32 ; "Call" print routine
add rsp, 8 ; Clear the stack - "take out" first element from the stack
push qword [number2] ; Push into the stack 1st argument: the number to be printed
push qword cont2 ; Push into the stack where to return from routine
jmp print_number_32 ; "Call" print routine
add rsp, 8 ; Clear the stack - "take out" first element from the stack
jmp exit
; End of the code
push qword [number1] ; Push into the stack 1st argument: the number to be printed
call print_number_32 ; "Call" print routine
add rsp, 8 ; Clear the stack - "take out" first element from the stack
push qword [number2] ; Push into the stack 1st argument: the number to be printed
call print_number_32 ; "Call" print routine
add rsp, 8 ; Clear the stack - "take out" first element from the stack
call exit
; End of the code
and in routine_print.asm
jmp [rsp]
Sometimes, mostly when you use online compilers, you have to keep all the code in one file. For this case you can download final version of print
routine in a single file.