fulmanski.pl: tutorials

Chapter 4

Second program

Initial version: 2025-02-25
Last update: 2025-03-02

In this chapter you will write your second program. This is quite important code because it allows you to print numbers. In consequences you may use it to verify effect of executing other instructions. You will learn about few instructions needed to implement printing routine as well as the concept how to call a routine with stack frame and pair of CALL and RET instructions.

Table of contents

Registers in x86 architectures

To be able to use assembler you have to name correctly chunks of data you want to operate: bytes, words, etc. In x86 very often the size is the consequence of the operand you use, particular in case of registers. That is why it is very important to know basic register structure.

At the beginning, in 8086 era, registers were organized as follow:


General registers

111111
5432109876543210
|______AX______|  Accumulator
|__AH__||__AL__|

|______BX______|  Base
|__BH__||__BL__|

|______CX______|  Count
|__CH__||__CL__|

|______DX______|  Data
|__DH__||__DL__|

Pointers and index

|______SP______|  Stack Pointer
|______BP______|  Base Pointer
|______SI______|  Source Index
|______DI______|  Destination Index

Segment

|______CS______|  Code
|______DS______|  Data
|______SS______|  Stack
|______ES______|  Extract

Program status

|____FLAGS_____|  
|______PC______|

When the architecture was extended to 32 bits, Intel decided to preserve backward compatibility, so the general registers layout stayed the same, but with "extended" higher part above 16 bits (probably this explains letter E prefixes "old" names):


80386, Pentium

General registers

3322222222221111111111
10987654321098765432109876543210
|______________E?X_____________|
                |______?X______|
                |__?H__||__?L__|

? = A, B, C or D

Pointers and index

|______________ESP_____________|
|______________EBP_____________|
|______________ESI_____________|
|______________EDI_____________|

Program status

|______________EAX_____________|  
|______________EAX_____________|

Similar approach was applied in case of transition to 64-bit architecture.


Intel, AMD 64-bit

666655555555554444444444333333333322222222221111111111
3210987654321098765432109876543210987654321098765432109876543210
|      ||      ||      ||      ||      ||      ||      ||      |
|__64__||__56__||__48__||__40__||__32__||__24__||__16__||__8___|
|_____________________________RAX______________________________|
|xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx|______________EAX_____________|
|xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx|______AX______|
|xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx|__AH__||__AL__|

RAX, RBX, RCX, RDX, RSI, RDI, RSP, RBP, R8-R15
General purpose registers: EAX, EBX, ECX, EDX, ESI, EDI
EBP -- Base Pointer
ESP -- Stack pointer

x64 extends x64's 8 general-purpose registers to be 64-bit, and adds 8 new 64-bit registers. The 64-bit registers have names beginning with R, so for example the 64-bit extension of EAX is called RAX. The new registers are named R8 through R15. Why R? Some people say R is from really extended which is nice explanation, but probably R comes simply from register.

The lower 32 bits, 16 bits, and 8 bits of each register are directly addressable in operands. This includes registers, like ESI, whose lower 8 bits were not previously addressable. The following table specifies the assembly-language names for the lower portions of 64-bit registers:


64-bit register | Lower 32 bits | Lower 16 bits | Lower 8 bits
==============================================================
rax             | eax           | ax            | al
rbx             | ebx           | bx            | bl
rcx             | ecx           | cx            | cl
rdx             | edx           | dx            | dl
rsi             | esi           | si            | sil
rdi             | edi           | di            | dil
rbp             | ebp           | bp            | bpl
rsp             | esp           | sp            | spl
r8              | r8d           | r8w           | r8b
r9              | r9d           | r9w           | r9b
r10             | r10d          | r10w          | r10b
r11             | r11d          | r11w          | r11b
r12             | r12d          | r12w          | r12b
r13             | r13d          | r13w          | r13b
r14             | r14d          | r14w          | r14b
r15             | r15d          | r15w          | r15b

Sometimes rXb is called rXl, l - lower.

How to print a number

From the preceding chapter you know how to print a sequence of characters – a string. Numbers are stored in computer not in string format but encoded in binary sequence. Your task is to turn binary format into readable sequence of digits that could be printed. Because you are on the lowest possible level, there is no instruction of this type you can use so you have to write it on your own. Your goal at this moment is to implement an algorithm allowing you to print a number.

The idea of an algorithm printing a number at given address

Below I present an algorithm printing number (to be more precise an unsigned integer number) written in Python. Python is so simple that resembles pseudo-code but of course can be executed to verify its correctness.

Here you have the code:


def int_div(dividend, divisor):
  integer_part = (int)(dividend / divisor)
  reminder = (int)(dividend % divisor)
  return (integer_part, reminder)
  
def print_number(number):
  n = number
  while n > 0:
    integer_part, reminder = int_div(n, 10)
    n = integer_part
    digit = reminder
    
    print(digit, end='')

number = 12345
print_number(number)

The int_div(dividend, divisor) is a helper function performing integer division of two integers. For example int_div(27, 10) returns tuple (2, 7).

When executed it prints:

As you can see this algorithm has some drawback – it prints digits "in reverse order" where the least significant digit is printed as the most left digit. You can fix it easily introducing buffer:


def print_number(number):
  n = number
  buffer = ""
  while n > 0:
    integer_part, reminder = int_div(n, 10)
    n = integer_part
    digit = reminder
    buffer = (str)(digit) + buffer
    
  print(buffer)

Now the result is correct:

Division instruction

From the code presented in previous section you know that you need division instruction. Division is one of the most difficult mathematical operation you may heard in school (do you remember how painful was dividing?) but unexpectedly you start learning about instructions from this instruction.

The DIV (unsigned integer divide) divides unsigned integer value in the AX, DX:AX, EDX:EAX, or RDX:RAX registers (dividend) by the source operand (divisor) and stores the result in the AX (AH:AL), DX:AX, EDX:EAX, or RDX:RAX registers. The source operand can be a general purpose register or a memory location. The action of this instruction depends on the operand size (dividend/divisor). Division using 64-bit operand is available only in 64-bit mode. This instruction has the following formats:


Operand Size     Dividend Divisor Quotient Remainder Maximum Quotient
Word/byte        AX       r/m8    AL       AH        255
Doubleword/word  DX:AX    r/m16   AX       DX        65,535
Quadword/
doubleword       EDX:EAX  r/m32   EAX      EDX       2^32 - 1
Doublequadword/
quadword         RDX:RAX  r/m64   RAX      RDX       2^64 - 1

For example consider the following division:


AX = 10111011 01111110 = 47998_10
BL = 11000010          = 192_10

AX / BL -> AL = 249_10 (11111001), AH = 190_10 (10111110)

AX = 10111110 11111001

However the life is not always so easy. There can be situation when quotient (integer part of the result) will not fit into designated register:


AX = 11111111 11111111 = 65535_10
BL = 00000010          = 2_10

AX / BL -> 32767_10 + 1_10

32767_10 = 01111111 11111111 > 11111111 = FFh

In case when result does not fit into register the #DE (Division Error) exception is raised.

To be honest, with DIV alone it wouldn't be possible to implement anything, so you need some other instructions. These instructions are: CMP, INC, JMP, JNE, MOV, SUB and XOR. let's take a look at them one by one.

CMP instruction

CMP first, second compares the first source operand with the second source operand and sets the status flags in the EFLAGS register according to the results. The comparison is performed by subtracting the second operand from the first operand (tmp = first - second) and then setting the status flags in the same manner as the SUB instruction: it changes the values of ZF and CF flags; see examples below to get know how it works.

Case 1: AX < BX


MOV AX,5
MOV BX,8
CMP AX,BX

Result: ZF = 0 and CF = 1

Case 2: AX > BX


MOV AX,8
MOV BX,5
CMP AX,BX

Result: ZF = 0 and CF = 0

Case 3: AX = BX


MOV AX,5
MOV BX,AX
CMP AX,BX

Result: ZF = 1 and CF = 0

When an immediate value is used as an operand, it is sign-extended to the length of the first operand.

The CMP instruction is typically used in conjunction with a conditional jump from Jcc family, condition move (CMOVcc family), or SETcc instruction.

INC instruction

INC what adds 1 to the destination operand what, while preserving the state of the CF flag. The destination operand can be a register or a memory location. This instruction allows a loop counter to be updated without disturbing the CF flag.

Note this instruction can be used with a LOCK prefix to allow the following instruction to be executed atomically:


lock
inc ebx

JMP instruction

JMP where – transfers program control to a different point in the instruction stream without recording return information. The destination (target) operand specifies the address of the instruction being jumped to. This operand can be an immediate value, a general-purpose register, or a memory location.

JNE instruction

JNE where – Checks the state of one or more of the status flags in the EFLAGS register (CF, OF, PF, SF, and ZF) and, if the flags are in the specified state (condition), performs a jump to the target instruction specified by the destination operand. A condition code (cc) is associated with each instruction to indicate the condition being tested for. If the condition is not satisfied, the jump is not performed and execution continues with the instruction following the Jcc instruction.

The JRCXZ, JECXZ, and JCXZ instructions differ from other Jcc instructions because they do not check status flags. Instead, they check RCX, ECX or CX for 0.

MOV instruction

MOV dst, src copies the second operand (src – source operand) to the first operand (dst – destination operand). The source operand can be an immediate value, general-purpose register, segment register, or memory location; the destination register can be a general-purpose register, segment register, or memory location. Both operands must be the same size, which can be a byte, a word, a doubleword, or a quadword.

SUB instruction

SUB dst, src subtracts the second operand (src – source operand) from the first operand (dst – destination operand) and stores the result in the destination operand:


dst := dst - src

The destination operand can be a register or a memory location; the source operand can be an immediate, register, or memory location (however, two memory operands cannot be used in one instruction). When an immediate value is used as an operand, it is sign-extended to the length of the destination operand format.

This instruction can be used with a LOCK prefix to allow the instruction to be executed atomically.

XOR instruction

XOR dst, src performs a bitwise exclusive OR (XOR) operation on the destination (first) and source (second) operands and stores the result in the destination operand location:


dst := dst XOR src

The source operand can be an immediate, a register, or a memory location; the destination operand can be a register or a memory location (however, two memory operands cannot be used in one instruction). Each bit of the result is 1 if the corresponding bits of the operands are different; each bit is 0 if the corresponding bits are the same.

This instruction can be used with a LOCK prefix to allow the instruction to be executed atomically.

Implementing printing method – first approach

Although you know how to print digits in correct order, first you will implement "reverted" version as it is simpler. Next you will modify it a little bit to get final solution. Please spend some time to understand how my code works – I put as much comments and as descriptive as I can. If something is not clear enough, please let me know and I will definitely fix it.


section .data               ; Data section

transTab: db "0123456789"   ; Translation table

section .bss                ; Block Starting Symbol section
                            ; It contains uninitialized data.

result: resb 16             ; Reserve space for result.
                            ; Max 16 digit

section .text

global  _start
 
_start:
                            ; Put data to print into
                            ; EDX:EAX
                            ; For simplicity I assume EDX is always equal to 0
        mov edx, 0          ; Set EDX to default value
        mov eax, 12345      ; Set EAX to the number you want to print
        jmp printNumber     ; Let's print

; Print number code: begin
; Init        
printNumber:
        xor rbx, rbx        ; Clear RBX register (set it to 0)
        mov ebx, result     ; Set EBX part of RBX to point to the beginning of the buffer

; BEGIN: Prepare data        
printLoop:                 
        mov ecx, 10
        div ecx             ; Div EDX:EAX by ECX
                            ; EAX = quotient (an integer part)
                            ; EDX = remainder
        mov ecx, [transTab + edx] ; Copy ASCII value corresponding to reminder to ECX
        mov [ebx], cl       ; Copy CL part of ECX (1 byte instead of 4 bytes) to 'result' buffer
        inc ebx             ; Move to the next byte in the buffer
        mov edx, 0          ; Restore default EDX value
        cmp eax, 0          ; Compare EAX with immediate value: 0
        
        jne printLoop       ; Jump if operands of previous CMP instruction
                            ; are not equal - keep looping until EAX
                            ; is zero which means that all digits are
                            ; converted. When done go to the print part
        
; BEGIN: Print result buffer
print:
        sub     ebx, result ; Calculate length of a string to print
        mov     rdx, rbx    ; arg3: length of a string to print
        mov     rsi, result ; arg2: pointer to a string
        mov     rdi, 1      ; arg1: where to write, so called `file descriptor`
                            ; in this case stdout (screen)
        mov     rax, 1      ; System call number (sys_write)
        syscall             ; Call a system function
 
; BEGIN: Exit
        mov     rdi, 0      ; Exit code, 0=normal
        mov     rax, 60     ; System call number (sys_exit)
        syscall             ; Call a system function
; End of the code

I hope the code with my comments is clear but I want to comment one instruction:


mov [ebx], cl      ; Copy CL part of ECX (1 byte instead of 4 bytes) to 'result' buffer

If you look into two preceding instructions:


div ecx             ; Div EDX:EAX by ECX
                    ; EAX = quotient (an integer part)
                    ; EDX = remainder
mov ecx, [transTab + edx] ; Copy ASCII value corresponding to reminder to ECX

you see that EDX contains reminder of division. Because you divide by 10, the reminder is in range from 0 to 9. Notice that when the lowest digit is 5, the reminder is 5, when the lowest digit is 7, the reminder is 7, etc. So the reminder is a position in transTab of a corresponding digit (this is exactly what is calculated in [transTab + edx] where EDX is a position in table transTab or more properly EDX is an offset from the address transTab which is the beginning of the translation table.

Because ECX is a 32-bit register, so you transfer from memory 32 bits starting at address transTab + edx:


div(13,10) -> reminder=3

beginning of transTab
|
|  transTab+reminder
|  |
0123456789
   ||||
   |||4th byte to transfer
   |||
   ||3rd byte to transfer
   ||
   |2nd byte to transfer
   |
   1st byte to transfer
   
You need only the first byte but all four bytes
will be transferred.

In consequence executing mov ecx, [transTab + edx] you transfer 4 bytes into ECX.

Now if you do:


mov [ebx], ecx

you will copy 4 bytes to the buffer starting at address given in EBX. This will get you into trouble when you will be close to the end of the buffer. For example, if you will be at the last possible address, then above instruction will put 1st byte from translation address at last address but then 2nd byte at last+1 address, 3rd byte at last+2 address and finally 4th byte at last+3 address. As you san see this way you will exceed your buffer by 3 bytes and possibly destroy other data.

Problem to solve By the way, this would be a good exercise for you to write a program veryfying this behavior.

Implementing printing method – final approach


section .data              ; Data section

transTab: db "0123456789"  ; Translation table

section .bss               ; Block Starting Symbol section
                           ; It contains uninitialized data.

result: resb 16            ; Reserve space for result.
                           ; Max 16 digit

section .text

global  _start
 
_start:
                           ; Put data to print into
                           ; EDX:EAX
                           ; For simplicity I assume EDX is always equal to 0
        mov edx, 0         ; Set EDX to default value
        mov eax, 12345     ; Set EAX to the number you want to print
        jmp printNumber    ; Let's print

; BEGIN: Print number code
; Init        
printNumber:
        xor rbx, rbx       ; Clear RBX register (set it to 0)
        ; mov ebx, result  ; Set EBX to point to the beginning of the buffer
        mov ebx, result+15 ; Set EBX part of RBX to point to the end of the buffer UPDATED

; BEGIN: Prepare data        
printLoop:                 
        mov ecx, 10
        div ecx            ; Div EDX:EAX by ECX
                           ; EAX = quotient (an integer part)
                           ; EDX = remainder
        mov ecx, [transTab + edx] ; Copy ASCII value corresponding to reminder to ECX
        mov [ebx], cl      ; Copy CL part of ECX (1 byte instead of 4 bytes) to 'result' buffer
        ; inc ebx            ; Move to the next byte in the buffer
        dec ebx            ; Move to the previous byte in the buffer UPDATED
        mov edx, 0         ; Restore default EDX value
        cmp eax, 0         ; Compare EAX with immediate value: 0
        
        jne printLoop      ; Jump if operands of previous CMP instruction
                           ; are not equal - keep looping until EAX
                           ; is zero which means that all digits are
                           ; converted. When done go to the print part
        
; BEGIN: Print result buffer
print:
        ; sub ebx, result  ; Calculate length of a string to print
                           ; Calculate length of a string to print UPDATED
        xor rax, rax       ; Set RAX to be equal to 0 NEW
        mov eax, result+16 ; Prepare `DST` argument for SUB (DST := DST – SRC) NEW
        sub eax, ebx       ; Get the length UPDATED
        mov rdx, rax       ; arg3: length of a string to print
        ; mov rsi, result    ; arg2: pointer to a string
        xor rsi, rsi       ; Clear RSI register (set it to 0) NEW; see explanation below
        mov esi, ebx       ; arg2: pointer to a string UPDATED
        mov rdi, 1         ; arg1: where to write, so called `file descriptor`
                           ; in this case stdout (screen)
        mov rax, 1         ; System call number (sys_write)
        syscall            ; Call a system function
; END: Print number code

; BEGIN: Exit
        mov rdi, 0         ; Exit code, 0=normal
        mov rax, 60        ; System call number (sys_exit)
        syscall            ; Call a system function
; End of the code

The code is not much different than previous version, however one part needs my explanations.

In the first version there is an instruction:


mov rsi, result ; arg2: pointer to a string

Note that RSI is a 64-bit register while result is a numeric constant. This constant is encoded as immediate value on 64-bit to fit smoothly into 64-bit register.

In the second version pointer to a string is in EBX register. Because EBX is a 32-bit register it fits into ESI which is a lower part of 64-bit RSI register. If you do simply:


mov esi, ebx

the lower part of RSI would be equal to EBX. What about higher part of RSI? Nobody knows. It should be equal to 0 in order the whole RSI to be equal to a pointer to a string (which is EBX).

In consequence first you clear RSI (set to 0) with XOR instruction and then you can put safely into a lower part of RSI content of EBX so for sure RSI would be equal to EBX; hence the sequence of instructions:


xor rsi, rsi       ; Clear RSI register (set it to 0) NEW; see explanation below
mov esi, ebx       ; arg2: pointer to a string UPDATED

Implementing "printing" method – hacker's solution

Printing is the way you can "return" result to the user. However with some additional assumption (for example limiting what you print) you can return some results with different method. Please look at the following code:


section .data              ; Data section

global  _start             
 
_start:         
        mov     dx, 0      ; dividend - higher half           
        mov     ax, 16     ; dividend - lower half
        mov     cx, 5      ; divisor
        div     cx         ; div dx:ax by cx
 
; Exit
                           ; Use exit code to get result
        mov     rdi, rax   ; Quotient
                           ; or
       ;mov     rdi, rdx   ; Remainder
        mov     rax, 60    ; System call number (sys_exit)
        syscall            ; Call a system function
; End of the code

As you can see you divide 16 by 5 and just right after this you finish execution calling sys_exit system function. Making this call in previous examples you wrote:


mov rdi, 0         ; Exit code, 0=normal

There is nothing against returning anything different than 0 – it does not influence execution of your program, your operating system or anything else. It is just an information for a caller who can interpret the number to decide if execution was successful or not. Typically 0 means normal end of the execution, but it is only a convention.

So, in the above code you use sys_exit to return either quotient or remainder:


[... uncomment quotient, comment remainder ...]
fulmanp@fulmanp-k2:~/assembler$ nasm -f elf64 inst_64_div.asm
fulmanp@fulmanp-k2:~/assembler$ ld inst_64_div.o -o inst_64_div
fulmanp@fulmanp-k2:~/assembler$ ./inst_64_div 
fulmanp@fulmanp-k2:~/assembler$ echo $?
3
[... comment quotient, uncomment remainder ...]
fulmanp@fulmanp-k2:~/assembler$ nasm -f elf64 inst_64_div.asm
fulmanp@fulmanp-k2:~/assembler$ ld inst_64_div.o -o inst_64_div
fulmanp@fulmanp-k2:~/assembler$ ./inst_64_div 
fulmanp@fulmanp-k2:~/assembler$ echo $?
1

The sequence $? is a special variable in BASH that always holds the return/exit code of the last executed command. You can view it in a terminal by running echo $?.

The clear drawback of this approach is that you can return only one number and calling sys_exit ends execution of your program.

Into the function

The code you have just completed allows you to print a number on the screen, so you can treat it as a PoC (Proof of Concept) how it can be done. However you can not treat it as an useful tool because it prints only a number from a given address. Your next task is to modify the code so you could call it for different numbers located at different addresses.

Non-function solution

Saying the truth, you can easily print different number assuming that every number must be copied into one, very specific memory location. The code below applies this approach, where ntp is the label of memory location designated to keep number to be printed (ntp – number to print). You need also another one "fixed memory location designated to preserve address where to return from printing routine and continue execution – this would be a wtr label (where to return).


section .data              ; Data section

transTab: db "0123456789"  ; Translation table
newLine:  db 10            ; Code for printing a new line

ntp:     dd 0              ; Number to print
                           ; Here you have to move every number
                           ; you want to print
wtr:     dq 0              ; Where to return from print call

number1: dd 12345
number2: dd 67890

section .bss               ; Block Starting Symbol section
                           ; It contains uninitialized data.

result: resb 16            ; Reserve space for result.
                           ; Max 16 digit

section .text

global  _start

; BEGIN: Print number code
; Init        
printNumber:
        xor rbx, rbx       ; Clear RBX register (set it to 0)
        mov ebx, result+15 ; Set EBX part of RBX to point to the end of the buffer

; BEGIN: Prepare data        
printLoop:                 
        mov ecx, 10
        div ecx            ; Div EDX:EAX by ECX
                           ; EAX = quotient (an integer part)
                           ; EDX = remainder
        mov ecx, [transTab + edx] ; Copy ASCII value corresponding to reminder to ECX
        mov [ebx], cl      ; Copy CL part of ECX (1 byte instead of 4 bytes) to 'result' buffer
        dec ebx            ; Move to the previous byte in the buffer
        mov edx, 0         ; Restore default EDX value
        cmp eax, 0         ; Compare EAX with immediate value: 0
        
        jne printLoop      ; Jump if operands of previous CMP instruction
                           ; are not equal - keep looping until EAX
                           ; is zero which means that all digits are
                           ; converted. When done go to the print part
        
; BEGIN: Print result buffer
print:
                           ; Calculate length of a string to print
        xor rax, rax       ; Set RAX to be equal to 0 NEW
        mov eax, result+16 ; Prepare `DST` argument for SUB (DST := DST – SRC) NEW
        sub eax, ebx       ; Get the length
        mov rdx, rax       ; arg3: length of a string to print
        xor rsi, rsi       ; Clear RSI register (set it to 0) NEW; see explanation below
        mov esi, ebx       ; arg2: pointer to a string
        mov rdi, 1         ; arg1: where to write, so called `file descriptor`
                           ; in this case stdout (screen)
        mov rax, 1         ; System call number (sys_write)
        syscall            ; Call a system function

        ; BEGIN: Print new line
        mov rdx, 1         ; arg3: length of a string to print
        mov rsi, newLine   ; arg2: pointer to a string
        mov rdi, 1         ; arg1: where to write, so called `file descriptor`
                           ; in this case stdout (screen)
        mov rax, 1         ; System call number (sys_write)
        syscall            ; Call a system function
        ; END: Print new line

        jmp [wtr]          ; Jump to the address saved in wtr (where to return)
                           ; before print call
; END: Print number code

_start:
                           ; Put data `number1` to print into
                           ; EDX:EAX
                           ; For simplicity I assume EDX is always equal to 0
        mov edx, 0         ; Set EDX to default value
        mov eax, [number1] ; Set EAX to the number you want to print
        mov qword [wtr], cont1 ; Set return address from print routine
        jmp printNumber    ; "Call" print routine

cont1:
                           ; Put data `number2` to print into
                           ; EDX:EAX
                           ; For simplicity I assume EDX is always equal to 0
        mov edx, 0         ; Set EDX to default value
        mov eax, [number2] ; Set EAX to the number you want to print
        mov qword [wtr], cont2 ; Set return address from print routine
        jmp printNumber    ; "Call" print routine

cont2: 
; BEGIN: Exit
        mov rdi, 0         ; Exit code, 0=normal
        mov rax, 60        ; System call number (sys_exit)
        syscall            ; Call a system function
; End of the code

What is wrong with the code you have now

The code you have now works but behaves like it would be the only code in your future program. I mean that printing routine utilize any register it wants without caring if they store any data or not. In real program, which do a lot of things, printing is just a one piece of a big jigsaw puzzle and every routines must care about registers it uses. That means that routine should follow the simple algorithm:

Make a copy of every register routine you want to use.
Do what routine should do – at this moment it is safe to use in routine any register you want.
Restore all the registers.

In your code printing routine utilize the following registers: RAX, RBX, RCX, RDX, RSI, RDI so you need a space to store all of them. You also need a method to pass a return address to the routine. In the code you have now this is for what you use wtr. However it would be nice if you could use some other, well known address space without using explicite names like wtr. A good idea is to use a stack with its POP and PUSH instructions. The general idea of the stack was explained in XXX, so please look there for clarification if you do not remember. With the stack the above tree-steps algorithm will be as follow:

Push into the stack any data you need in routine.
Push into the stack a return address.
Jump to the printing routine.
At the beginning of the printing routine make a copy of every register you want to use: push to the stack registers RAX, RBX, RCX, RDX, RSI, RDI exactly in that order.
Do what routine should do.
At the end of printing routine restore all the registers: pop from the stack values and move them to the registers in the reversed order you pushed them.
Pop from the stack a return address.
Jump to popped address.
Move a stack pointer to free space occupied for a data.

So the stack will be:


BP - base pointer
SP - stack pointer

address    BP          xxxx
address- 8 BP-[1*8= 8] DATA   SP+[7*8=56]
address-16 BP-[2*8=16] WTR    SP+[6*8=48]
address-24 BP-[3*8=24] RAX    SP+[5*8=40]
address-32 BP-[4*8=32] RBX    SP+[4*8=32]
address-40 BP-[5*8=40] RCX    SP+[3*8=24]
address-48 BP-[6*8=48] RDX    SP+[2*8=16]
address-56 BP-[7*8=56] RSI    SP+[1*8= 8]
address-65 BP-[8*8=64] RDI    SP

In the stack given as above an access to the RCX register, for example, is possible relatively to the top of the stack given by RSP value: RSP+24.

To be clear I will make a "zoom" of the first few lines and explain how data are aligned in the memory. Before you push any data into the stack it looks like below:


address   |some data  | <-- BP, SP  Bottom of the stack (BP). It is also tip of the stack ("current" position of the SP).
address- 1|           |
address- 2|           |

When you push first data into the stack, say it would be DATA, the stack will contain:


address   |xxxxxxxxxxx| <-- BP bottom of the stack. Address of the first byte you can save on the stack
address- 1|DATA byte 8|
address- 2|DATA byte 7|
address- 3|DATA byte 6|
address- 4|DATA byte 5|
address- 5|DATA byte 4|
address- 6|DATA byte 3|
address- 7|DATA byte 2|
address- 8|DATA byte 1| <-- BP-8 offset by 8 bytes (64 bits) from the bottom of the stack.
                            This is also a "current" position of the SP

When you push next data into the stack, say it would be WTR, it will contain:


address   |xxxxxxxxxxx| <-- BP bottom of the stack. Address of the first byte you can save on the stack
address- 1|DATA byte 8|
address- 2|DATA byte 7|
address- 3|DATA byte 6|
address- 4|DATA byte 5|
address- 5|DATA byte 4|
address- 6|DATA byte 3|
address- 7|DATA byte 2|
address- 8|DATA byte 1| <-- BP-8 == SP+8
address- 9|WTR  byte 8|
address-10|WTR  byte 7|
address-11|WTR  byte 6|
address-12|WTR  byte 5|
address-13|WTR  byte 4|
address-14|WTR  byte 3|
address-15|WTR  byte 2|
address-16|WTR  byte 1| <-- BP-16 offset by 16 bytes (2 x 64 bits) from the bottom of the stack.
                            This is also a "current" position of the SP

From the above "zoom" you can see that to get byte 1 of the DATA you can either use BP-8 address or SP+8.

With the above in your mind the code will look as it is below:


section .data              ; Data section

transTab: db "0123456789"  ; Translation table
newLine:  db 10            ; Code for printing a new line

number1: dd 12345
number2: dd 67890

section .bss               ; Block Starting Symbol section
                           ; It contains uninitialized data.

result: resb 16            ; Reserve space for result.
                           ; Max 16 digit

section .text

global  _start

; BEGIN: Print number code
; Init        
printNumber:
        ; Save registers as they are before routine execution
        push rax
        push rbx
        push rcx
        push rdx
        push rsi
        push rdi
                           ; Put data to print taken from the stack
                           ; into EDX:EAX
                           ; For simplicity I assume EDX is always equal to 0
        mov edx, 0         ; Set EDX to default value
        mov eax, [rsp+56]  ; Set EAX to the number you want to print;
                           ; this number is on the stack
        
        xor rbx, rbx       ; Clear RBX register (set it to 0)
        mov ebx, result+15 ; Set EBX part of RBX to point to the end of the buffer

; BEGIN: Prepare data
printLoop:
        mov ecx, 10
        div ecx            ; Div EDX:EAX by ECX
                           ; EAX = quotient (an integer part)
                           ; EDX = remainder
        mov ecx, [transTab + edx] ; Copy ASCII value corresponding to reminder to ECX
        mov [ebx], cl      ; Copy CL part of ECX (1 byte instead of 4 bytes) to 'result' buffer
        dec ebx            ; Move to the previous byte in the buffer
        mov edx, 0         ; Restore default EDX value
        cmp eax, 0         ; Compare EAX with immediate value: 0
        
        jne printLoop      ; Jump if operands of previous CMP instruction
                           ; are not equal - keep looping until EAX
                           ; is zero which means that all digits are
                           ; converted. When done go to the print part
        
; BEGIN: Print result buffer
print:
                           ; Calculate length of a string to print
        xor rax, rax       ; Set RAX to be equal to 0 NEW
        mov eax, result+16 ; Prepare `DST` argument for SUB (DST := DST – SRC) NEW
        sub eax, ebx       ; Get the length
        mov rdx, rax       ; arg3: length of a string to print
        xor rsi, rsi       ; Clear RSI register (set it to 0) NEW; see explanation below
        mov esi, ebx       ; arg2: pointer to a string
        mov rdi, 1         ; arg1: where to write, so called `file descriptor`
                           ; in this case stdout (screen)
        mov rax, 1         ; System call number (sys_write)
        syscall            ; Call a system function

        ; BEGIN: Print new line
        mov rdx, 1         ; arg3: length of a string to print
        mov rsi, newLine   ; arg2: pointer to a string
        mov rdi, 1         ; arg1: where to write, so called `file descriptor`
                           ; in this case stdout (screen)
        mov rax, 1         ; System call number (sys_write)
        syscall            ; Call a system function
        ; END: Print new line
        
        ; Restore all registers
        pop rdi
        pop rsi
        pop rdx
        pop rcx
        pop rbx
        pop rax
        jmp [rsp]          ; Jump to the address saved at the top of the stack (where to return)
                           ; just before before print call (but after all arguments needed
                           ; by routine)
; END: Print number code

_start: 
        push qword [number1] ; Push into the stack 1st argument: the number to be printed
        push qword cont1   ; Push into the stack where to return from routine
        jmp printNumber    ; "Call" print routine
        add rsp, 8         ; Clear the stack - "take out" first element from the stack

cont1:
        push qword [number2] ; Push into the stack 2st argument: the number to be printed
        push qword cont2   ; Push into the stack where to return from routine
        jmp printNumber    ; "Call" print routine
        add rsp, 8         ; Clear the stack - "take out" first element from the stack

cont2: 
; BEGIN: Exit
        mov rdi, 0         ; Exit code, 0=normal
        mov rax, 60        ; System call number (sys_exit)
        syscall            ; Call a system function
; End of the code

Much of the code stays untouched. The changes concerns:

prologue of the routine:


        ; Save registers as they are before routine execution
        push rax
        push rbx
        push rcx
        push rdx
        push rsi
        push rdi
                           ; Put data to print taken from the stack
                           ; into EDX:EAX
                           ; For simplicity I assume EDX is always equal to 0
        mov edx, 0         ; Set EDX to default value
        mov eax, [rsp+56]  ; Set EAX to the number you want to print;
                           ; this number is on the stack

epilogue of the routine:


        ; Restore all registers
        pop rdi
        pop rsi
        pop rdx
        pop rcx
        pop rbx
        pop rax
        jmp [rsp]          ; Jump to the address saved at the top of the stack (where to return)
                           ; just before before print call (but after all arguments needed
                           ; by routine)

the routine call:


_start: 
        push qword [number1] ; Push into the stack 1st argument: the number to be printed
        push qword cont1   ; Push into the stack where to return from routine
        jmp printNumber    ; "Call" print routine
        add rsp, 8         ; Clear the stack - "take out" first element from the stack

cont1:
        push qword [number2] ; Push into the stack 1st argument: the number to be printed
        push qword cont2   ; Push into the stack where to return from routine
        jmp printNumber    ; "Call" print routine
        add rsp, 8         ; Clear the stack - "take out" first element from the stack

cont2:

Almost perfect

The code you have now is almost perfect. One thing it is missing is the lack of the typical function prologue. If you look at the prologue of the routine:


        ; Save registers as they are before routine execution
        push rax
        push rbx
        push rcx
        push rdx
        push rsi
        push rdi
                           ; Put data to print taken from the stack
                           ; into EDX:EAX
                           ; For simplicity I assume EDX is always equal to 0
        mov edx, 0         ; Set EDX to default value
        mov eax, [rsp+56]  ; Set EAX to the number you want to print;
                           ; this number is on the stack

The problem with this code is that offset to the first argument which is equal to 56 depends on the "local" values you put into the stack just after routine call (sequence of push-es to save registers). If you change them (add next push or remove some of the existing) you have to remember to modify number 56 to correct value. This is not bad, but you have to remember about this. However you do not have to, because location of arguments does not depends on any local activities on the stack (assuming you not destroy stack contents).

Another solution is to change reference point and do not use RSP which constantly changes but instead RBP which points to the bootom of the stack. For this reason in most cases every function begins with the well known sequence of instructions called function prologue:


        ; Classic function prologue
        push rbp
        mov rbp, rsp

What this does is to “save” the current position of the base pointer (the bottom of the “current” stack frame) with push rbp and replace it with the stack pointer (the tip/top of the stack) with mov rbp, rsp. So the new base pointer is the current top of the stack.

This will change a little bit stack layout:


address    old BP          xxxx
address- 8 new BP+[2*8=16] DATA   SP+[7*8=64]
address-16 new BP+[1*8= 8] WTR    SP+[6*8=56]
address-24 new BP          RBP    SP+[6*8=48] <- save "old" base pointer; from now (new) base pointer = (current) stack pointer
address-32 new BP-[1*8= 8] RAX    SP+[5*8=40]
address-40 new BP-[2*8=16] RBX    SP+[4*8=32]
address-48 new BP-[3*8=24] RCX    SP+[3*8=24]
address-56 new BP-[4*8=32] RDX    SP+[2*8=16]
address-64 new BP-[5*8=40] RSI    SP+[1*8= 8]
address-72 new BP-[6*8=48] RDI    SP

Consequently, if you have function prologue, you should have function epilogue to revert "old" base pointer:


        ; Classic function epilogue
        pop rbp
        ; Return from routine

With prologue and epilogue you have to modify only one instruction and replace:


mov eax, [rsp+56]  ; Set EAX to the number you want to print;

with


mov eax, [rbp+16]  ; Set EAX to the number you want to print;

With this change you always have an access to the first argument with rbp+16.

The changes, compared to the last code, are minor but for clarity I put the complete code below:


section .data              ; Data section

transTab: db "0123456789"  ; Translation table
newLine:  db 10            ; Code for printing a new line

number1: dd 12345
number2: dd 67890

section .bss               ; Block Starting Symbol section
                           ; It contains uninitialized data.

result: resb 16            ; Reserve space for result.
                           ; Max 16 digit

section .text

global  _start

; BEGIN: Print number code
; Init        
printNumber:
        ; Classic function prologue
        push rbp
        mov rbp, rsp
        ; Save registers as they are before routine execution
        push rax
        push rbx
        push rcx
        push rdx
        push rsi
        push rdi
                           ; Put data to print taken from the stack
                           ; into EDX:EAX
                           ; For simplicity I assume EDX is always equal to 0
        mov edx, 0         ; Set EDX to default value
        mov eax, [rbp+16]  ; Set EAX to the number you want to print;
                           ; this number is on the stack
        
        xor rbx, rbx       ; Clear RBX register (set it to 0)
        mov ebx, result+15 ; Set EBX part of RBX to point to the end of the buffer

; BEGIN: Prepare data
printLoop:
        mov ecx, 10
        div ecx            ; Div EDX:EAX by ECX
                           ; EAX = quotient (an integer part)
                           ; EDX = remainder
        mov ecx, [transTab + edx] ; Copy ASCII value corresponding to reminder to ECX
        mov [ebx], cl      ; Copy CL part of ECX (1 byte instead of 4 bytes) to 'result' buffer
        dec ebx            ; Move to the previous byte in the buffer
        mov edx, 0         ; Restore default EDX value
        cmp eax, 0         ; Compare EAX with immediate value: 0
        
        jne printLoop      ; Jump if operands of previous CMP instruction
                           ; are not equal - keep looping until EAX
                           ; is zero which means that all digits are
                           ; converted. When done go to the print part
        
; BEGIN: Print result buffer
print:
                           ; Calculate length of a string to print
        xor rax, rax       ; Set RAX to be equal to 0 NEW
        mov eax, result+16 ; Prepare `DST` argument for SUB (DST := DST – SRC) NEW
        sub eax, ebx       ; Get the length
        mov rdx, rax       ; arg3: length of a string to print
        xor rsi, rsi       ; Clear RSI register (set it to 0) NEW; see explanation below
        mov esi, ebx       ; arg2: pointer to a string
        mov rdi, 1         ; arg1: where to write, so called `file descriptor`
                           ; in this case stdout (screen)
        mov rax, 1         ; System call number (sys_write)
        syscall            ; Call a system function

        ; BEGIN: Print new line
        mov rdx, 1         ; arg3: length of a string to print
        mov rsi, newLine   ; arg2: pointer to a string
        mov rdi, 1         ; arg1: where to write, so called `file descriptor`
                           ; in this case stdout (screen)
        mov rax, 1         ; System call number (sys_write)
        syscall            ; Call a system function
        ; END: Print new line
        
        ; Restore all registers
        pop rdi
        pop rsi
        pop rdx
        pop rcx
        pop rbx
        pop rax
        ; Classic function epilogue
        pop rbp
        ; Return from routine
        jmp [rsp]          ; Jump to the address saved at the top of the stack (where to return)
                           ; just before before print call (but after all arguments needed
                           ; by routine)
; END: Print number code

_start: 
        push qword [number1] ; Push into the stack 1st argument: the number to be printed
        push qword cont1   ; Push into the stack where to return from routine
        jmp printNumber    ; "Call" print routine
        add rsp, 8         ; Clear the stack - "take out" first element from the stack

cont1:
        push qword [number2] ; Push into the stack 1st argument: the number to be printed
        push qword cont2   ; Push into the stack where to return from routine
        jmp printNumber    ; "Call" print routine
        add rsp, 8         ; Clear the stack - "take out" first element from the stack

cont2: 
; BEGIN: Exit
        mov rdi, 0         ; Exit code, 0=normal
        mov rax, 60        ; System call number (sys_exit)
        syscall            ; Call a system function
; End of the code

Exercise 1 Modify printing algorithm so it will print or not new line depending on the second argument passed to the routine (the first argument is a number you want to pint).

Separate function code

If you test your code you may want to move it to a separate file to be able to call it from all other codes you may ever write. The approach is similar to the one presented in a section Multiple files of a chapter First program.

routine_print.asm


section .data

transTab: db "0123456789"  ; Translation table
newLine:  db 10            ; Code for printing a new line

sys_exit    equ 60
sys_write   equ 1
stdout      equ 1

section .bss               ; Block Starting Symbol section
                           ; It contains uninitialized data.

result: resb 16            ; Reserve space for result.
                           ; Max 16 digit

section .text

global print_number_32
global exit

; BEGIN: Print number code
; Init
print_number_32:
        ; Classic function prologue
        push rbp
        mov rbp, rsp
        ; Save registers as they are before routine execution
        push rax
        push rbx
        push rcx
        push rdx
        push rsi
        push rdi
                           ; Put data to print taken from the stack
                           ; into EDX:EAX
                           ; For simplicity I assume EDX is always equal to 0
        mov edx, 0         ; Set EDX to default value
        mov eax, [rbp+16]  ; Set EAX to the number you want to print;
                           ; this number is on the stack
        
        xor rbx, rbx       ; Clear RBX register (set it to 0)
        mov ebx, result+15 ; Set EBX part of RBX to point to the end of the buffer

; BEGIN: Prepare data
printLoop:
        mov ecx, 10
        div ecx            ; Div EDX:EAX by ECX
                           ; EAX = quotient (an integer part)
                           ; EDX = remainder
        mov ecx, [transTab + edx] ; Copy ASCII value corresponding to reminder to ECX
        mov [ebx], cl      ; Copy CL part of ECX (1 byte instead of 4 bytes) to 'result' buffer
        dec ebx            ; Move to the previous byte in the buffer
        mov edx, 0         ; Restore default EDX value
        cmp eax, 0         ; Compare EAX with immediate value: 0
        
        jne printLoop      ; Jump if operands of previous CMP instruction
                           ; are not equal - keep looping until EAX
                           ; is zero which means that all digits are
                           ; converted. When done go to the print part
        
; BEGIN: Print result buffer
print:
                           ; Calculate length of a string to print
        xor rax, rax       ; Set RAX to be equal to 0 NEW
        mov eax, result+16 ; Prepare `DST` argument for SUB (DST := DST – SRC) NEW
        sub eax, ebx       ; Get the length
        mov rdx, rax       ; arg3: length of a string to print
        xor rsi, rsi       ; Clear RSI register (set it to 0) NEW; see explanation below
        mov esi, ebx       ; arg2: pointer to a string
        mov rdi, 1         ; arg1: where to write, so called `file descriptor`
                           ; in this case stdout (screen)
        mov rax, 1         ; System call number (sys_write)
        syscall            ; Call a system function

        ; BEGIN: Print new line
        mov rdx, 1         ; arg3: length of a string to print
        mov rsi, newLine   ; arg2: pointer to a string
        mov rdi, 1         ; arg1: where to write, so called `file descriptor`
                           ; in this case stdout (screen)
        mov rax, 1         ; System call number (sys_write)
        syscall            ; Call a system function
        ; END: Print new line
        
        ; Restore all registers
        pop rdi
        pop rsi
        pop rdx
        pop rcx
        pop rbx
        pop rax
        ; Classic function epilogue
        pop rbp
        ; Return from routine
        jmp [rsp]          ; Jump to the address saved at the top of the stack (where to return)
                           ; just before before print call (but after all arguments needed
                           ; by routine)
; END: Print number code


; BEGIN: Exit
exit:
    mov rdi, 0         ; Exit code, 0=normal
    mov rax, 60        ; System call number (sys_exit)
    syscall            ; Call a system function
; END: Exit

main.asm


section .data              ; Data section

number1: dd 12345
number2: dd 67890

section .text

extern print_number_32 
extern exit

global  _start
 
_start:
        push qword [number1] ; Push into the stack 1st argument: the number to be printed
        push qword cont1   ; Push into the stack where to return from routine
        jmp print_number_32 ; "Call" print routine
        add rsp, 8         ; Clear the stack - "take out" first element from the stack

cont1:
        push qword [number2] ; Push into the stack 1st argument: the number to be printed
        push qword cont2   ; Push into the stack where to return from routine
        jmp print_number_32  ; "Call" print routine
        add rsp, 8         ; Clear the stack - "take out" first element from the stack

cont2: 
        jmp exit
; End of the code

Compilation and execution result:


fulmanp@fulmanp-ThinkPad-T540p:~/Desktop/assembler/03_second_program$ nasm -f elf64 routine_print.asm -o routine_print.o
fulmanp@fulmanp-ThinkPad-T540p:~/Desktop/assembler/03_second_program$ nasm -f elf64 main.asm -o main.o
fulmanp@fulmanp-ThinkPad-T540p:~/Desktop/assembler/03_second_program$ ld main.o routine_print.o -o print_test
fulmanp@fulmanp-ThinkPad-T540p:~/Desktop/assembler/03_second_program$ ./print_test 
12345
67890

Final change

At the final change you may replace "manual" routine call and return with dedicated solution based on CALL and RET instruction. In main.asm replace:


_start:
        push qword [number1] ; Push into the stack 1st argument: the number to be printed
        push qword cont1   ; Push into the stack where to return from routine
        jmp print_number_32 ; "Call" print routine
        add rsp, 8         ; Clear the stack - "take out" first element from the stack

cont1:
        push qword [number2] ; Push into the stack 1st argument: the number to be printed
        push qword cont2   ; Push into the stack where to return from routine
        jmp print_number_32  ; "Call" print routine
        add rsp, 8         ; Clear the stack - "take out" first element from the stack

cont2: 
        jmp exit
; End of the code

with:


_start:
        push qword [number1] ; Push into the stack 1st argument: the number to be printed
        call print_number_32 ; "Call" print routine
        add rsp, 8         ; Clear the stack - "take out" first element from the stack

        push qword [number2] ; Push into the stack 1st argument: the number to be printed
        call print_number_32  ; "Call" print routine
        add rsp, 8         ; Clear the stack - "take out" first element from the stack

        call exit
; End of the code

and in routine_print.asm replace:


jmp [rsp]

with:

ret

Sometimes, mostly when you use online compilers, you have to keep all the code in one file. For this case you can download final version of print routine in a single file.

Exercise 2 Modify printing algorithm so it will utilize stack to print digits in correct order.

Exercise 3 Modify printing algorithm so it will print bits at a given address.

Exercise 4 Modify printing algorithm so it will print number (bits) of a given size (byte, words, double-word, quad-word) at a given address.