Chapter 2
First program

Initial version: 2025-02-19
Last update: 2025-02-19

In this chapter you will learn what precisely you have to do to get executable file. As you will see this is not simple as you have choice between 32-bit and 64-bit executables.

Table of contents


32-bit basic stand alone program


Making 32-bit code on 32-bit system with NASM



section .data

text:   db "Hello World!", 10
len:    equ $-text

section .text

global  _start
 
_start:
        mov     edx, len
        mov     ecx, text
        mov     ebx, 1

        mov     eax, 4
        int     0x80
 
; Exit
        mov     ebx, 0
        mov     eax, 1
        int     0x80
; End of the code
Verify correctness of the code by assembling it with:


nasm -f elf hello.asm
linking:


ld hello.o -o hello
and finally running


./hello
If no errors are reported the result is as follow:


fulmanp@fulmanp:~/assembler$ ./hello
Hello World!


Making 32-bit code on 64-bit system with NASM


When you try to make 32-bit program on 64-bit system assembly it as previously:


nasm -f elf hello.asm
but link as:

ld -m elf_i386 hello.o -o hello
Such a program is a 32-bit program, which can be verified by readelf Unix command:


fulmanp@fulmanp-k2:~/assembler$ readelf -h hello
ELF Header:
  Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 
  Class:                             ELF32
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           Intel 80386
  Version:                           0x1
  Entry point address:               0x8048080
  Start of program headers:          52 (bytes into file)
  Start of section headers:          216 (bytes into file)
  Flags:                             0x0
  Size of this header:               52 (bytes)
  Size of program headers:           32 (bytes)
  Number of program headers:         2
  Size of section headers:           40 (bytes)
  Number of section headers:         6
  Section header string table index: 3


Making (pseudo) 64-bit code on 64-bit system with NASM


The code presented above, without any changes, can be also assembled as 64-bit program (however this would not be a real 64-bit program because you still use 32-bit registers and function call convention) with:


fulmanp@fulmanp-k2:~/assembler$ nasm -f elf64 hello.asm
fulmanp@fulmanp-k2:~/assembler$ ld hello.o -o hello
fulmanp@fulmanp-k2:~/assembler$ readelf -h hello
ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 
  Class:                             ELF64
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           Advanced Micro Devices X86-64
  Version:                           0x1
  Entry point address:               0x4000b0
  Start of program headers:          64 (bytes into file)
  Start of section headers:          264 (bytes into file)
  Flags:                             0x0
  Size of this header:               64 (bytes)
  Size of program headers:           56 (bytes)
  Number of program headers:         2
  Size of section headers:           64 (bytes)
  Number of section headers:         6
  Section header string table index: 3


Getting content of assembled file


If you wander about content of assembled or linked file you can use xxd Unix command do dump these files in "readable" format:


fulmanp@fulmanp-k2:~/assembler$ xxd hello.o
0000000: 7f45 4c46 0101 0100 0000 0000 0000 0000  .ELF............
0000010: 0100 0300 0100 0000 0000 0000 0000 0000  ................
0000020: 4000 0000 0000 0000 3400 0000 0000 2800  @.......4.....(.
0000030: 0700 0300 0000 0000 0000 0000 0000 0000  ................
0000040: 0000 0000 0000 0000 0000 0000 0000 0000  ................
0000050: 0000 0000 0000 0000 0000 0000 0000 0000  ................
0000060: 0000 0000 0000 0000 0100 0000 0100 0000  ................
0000070: 0300 0000 0000 0000 6001 0000 0d00 0000  ........`.......
0000080: 0000 0000 0000 0000 0400 0000 0000 0000  ................
0000090: 0700 0000 0100 0000 0600 0000 0000 0000  ................
00000a0: 7001 0000 2200 0000 0000 0000 0000 0000  p..."...........
00000b0: 1000 0000 0000 0000 0d00 0000 0300 0000  ................
00000c0: 0000 0000 0000 0000 a001 0000 3100 0000  ............1...
00000d0: 0000 0000 0000 0000 0100 0000 0000 0000  ................
00000e0: 1700 0000 0200 0000 0000 0000 0000 0000  ................
00000f0: e001 0000 7000 0000 0500 0000 0600 0000  ....p...........
0000100: 0400 0000 1000 0000 1f00 0000 0300 0000  ................
0000110: 0000 0000 0000 0000 5002 0000 1b00 0000  ........P.......
0000120: 0000 0000 0000 0000 0100 0000 0000 0000  ................
0000130: 2700 0000 0900 0000 0000 0000 0000 0000  '...............
0000140: 7002 0000 0800 0000 0400 0000 0200 0000  p...............
0000150: 0400 0000 0800 0000 0000 0000 0000 0000  ................
0000160: 4865 6c6c 6f20 576f 726c 6421 0a00 0000  Hello World!....
0000170: ba0d 0000 00b9 0000 0000 bb01 0000 00b8  ................
0000180: 0400 0000 cd80 bb00 0000 00b8 0100 0000  ................
0000190: cd80 0000 0000 0000 0000 0000 0000 0000  ................
00001a0: 002e 6461 7461 002e 7465 7874 002e 7368  ..data..text..sh
00001b0: 7374 7274 6162 002e 7379 6d74 6162 002e  strtab..symtab..
00001c0: 7374 7274 6162 002e 7265 6c2e 7465 7874  strtab..rel.text
00001d0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00001e0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00001f0: 0100 0000 0000 0000 0000 0000 0400 f1ff  ................
0000200: 0000 0000 0000 0000 0000 0000 0300 0100  ................
0000210: 0000 0000 0000 0000 0000 0000 0300 0200  ................
0000220: 0b00 0000 0000 0000 0000 0000 0000 0100  ................
0000230: 1000 0000 0d00 0000 0000 0000 0000 f1ff  ................
0000240: 1400 0000 0000 0000 0000 0000 1000 0200  ................
0000250: 0068 656c 6c6f 2e61 736d 0074 6578 7400  .hello.asm.text.
0000260: 6c65 6e00 5f73 7461 7274 0000 0000 0000  len._start......
0000270: 0600 0000 0102 0000 0000 0000 0000 0000  ................


Explaining the code


Knowing that it works, now it's a time to explain why it works. Let's study the code line by line. Just for reminding you, the code is:


section .data

text:   db "Hello World!", 10
len:    equ $-text

section .text

global  _start
 
_start:
        mov     edx, len
        mov     ecx, text
        mov     ebx, 1

        mov     eax, 4
        int     0x80
 
; Exit
        mov     ebx, 0
        mov     eax, 1
        int     0x80
; End of the code
  • Character ; starts comment which and extend to the end of the line.
  • section .data
    Start of the data section; mixing data and code is not allowed.
  • text: db "Hello World!", 10
    Definition of the text to print ended by newline character(s). In this case we have code for Linux operating system so we use line feed character (LF, decimal code: 10) as the newline marker. By the way, MS-DOS chose CR+LF (decimal: 13 and 10), and Windows inherited this.
  • len: equ $ - text
    Definition of the constant value equal to: current address ($) minus address of the first element of variable text -- this should be equal to the length of the text you are going to print. Notice that len is a value (constant of the compilation), not an address. If you prefer variables replace this line by len: dd $-text.
  • section .text
    Start of the code (program) section; mixing data and code is not allowed.
  • global _start
    Make label available to linker. We must export the entry point to the ELF linker or loader. They conventionally recognize _start as their entry point. Use ld -e foo to override the default.
  • _start:
    Label; standard ld entry point.
  • mov edx, len (or mov edx, [len] if you prefer variables than constants)
    Move (copy, insert, put) to EDX register (EDX is a 32-bit register, RDX is its 64-bit equivalent) length of the text to print -- this would be a third argument of the function you are going to call. In the first case length is a constant, in the second you take it from variable. By the way copying data with MOV from one memory cell to the other is not allowed:

    
    mov [dest], [src] ; this is not allowed
    


    Here I have to mention basics about registers so you could work through next few sections. Generally speaking now you will use set of registers whose names are created with the following pattern:

    
     ::= X
     ::= 
    
     ::= A | B | C | D | E
     ::= R | E
     ::= X | H | L
    
    where, for example, correct register names for letter A are:

    
    RAX, EAX, AX, AH, AL
    
    In this case you reference to register A and it's different parts and sizes:

    
    6            33      11   00   0    
    3            21      65   87   0
    |             |       |   ||   |
    |             |       |.AH||.AL|   AH and  AL:  8 bits
    |             |       |...AX...|           AX: 16 bits
    |             |......EAX.......|          EAX: 32 bits
    |............RAX...............|          RAX: 64 bits
    
  • mov ecx, text
    Copy to ECX register (RSI register in 64-bit equivalent code) address of the first element of the text -- this would be a second argument of the function you are going to call.
  • mov ebx, 1
    Copy to EBX register (RDI) value 1 – this would be a first argument of the function you are going to call, so called file descriptor or file handler, indicating where to write (in this case stdout – standard output i.e. screen). Other file descriptors are: 0 – standard input (stdin) and 2 – standard error (stderr).
  • mov eax, 4
    Copy to EAX register (RAX) value 4 (1). This is a number of Linux function sys_write you are going to call. Notice that these numbers are different for different architectures and operation systems.
  • int 0x80 (syscall)
    Interrupt to call system function selected by EAX register (RAX). In this case this is sys_write function which takes three arguments in registers EBX, ECX and EDX (RDI, RSI and RDX).

    32-bit system function takes at most 6 arguments from registers EBX, ECX, EDX, ESI, EDI and EBP. EAX is used to specify the number of a system function you are going to call.

    64-bit system function takes at most 6 arguments from registers RDI, RSI, RDX, R10, R8, R9. RAX is used to specify the number of a system function. Values in registers RCX and R11 are destroyed.

    More precisely: INT means interrupt, and the number 0x80 is the interrupt number. An interrupt "transfers" the program flow to whomever is handling that interrupt. In Linux, 0x80 interrupt handler is the kernel, and is used to make system calls to the kernel by other programs.

    The kernel is notified about which system call the program wants to make, by examining the value in the register EAX. Each system call have different requirements about the use of the other registers. For example, a value of 1 in EAX means a system call of exit(); in this case the value in EBX holds the value of the status code for exit().
  • mov ebx, 0
    Copy to EBX register (RDI) value 0 -- this would be a first argument of the function you are going to call, so called errorlevel, indicating whether program was terminated correctly or not (0 means that everything was all right and program terminates normally).
  • mov eax, 1
    Copy to EAX register (RAX) value 1 (60). This is a number of Linux function sys_exit you are going to call to terminate program.
  • int 0x80 (syscall)
    Interrupt to call system function selected by EAX register (RAX).


Code for GNU AS


Now take a look at the same program but written in different "dialect" of assembler: GNU Assembler (also GNU AS or simply GAS).


.data                     # Data section

text: .ascii "Hello World!\n"
len = . - text

.text

.global  _start
 
_start:
      movl   $len, %edx
      movl   $text, %ecx
      movl   $1, %ebx

      movl   $4, %eax
      int    $0x80

# Exit
      movl   $0, %ebx
      movl   $1, %eax
      int    $0x80
# End of the code
The code looks a little bit strange but is equivalent to previously presented NASM version what you can verify assembling it:


as hello.s -o hello.o
linking:


ld hello.o -o hello
and finally running:


fulmanp@fulmanp-k2:~/assembler$ ./hello
Hello World!


Making 32-bit code on 64-bit system with GNU AS


As for NASM making 32-bit code on 64-bit system with GNU AS requires additional options usage:


fulmanp@fulmanp-k2:~/assembler$ as --32 hello.s -o hello.o
fulmanp@fulmanp-k2:~/assembler$ ld -m elf_i386 hello.o -o hello
As previously you can verify this is a 32-bit code:

fulmanp@fulmanp-k2:~/assembler$ readelf -h hello
ELF Header:
  Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 
  Class:                             ELF32
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           Intel 80386
  Version:                           0x1
  Entry point address:               0x8048074
  Start of program headers:          52 (bytes into file)
  Start of section headers:          204 (bytes into file)
  Flags:                             0x0
  Size of this header:               52 (bytes)
  Size of program headers:           32 (bytes)
  Number of program headers:         2
  Size of section headers:           40 (bytes)
  Number of section headers:         6
  Section header string table index: 3
As you may notice both NASM and GNU AS code are quite similar but they differ in details. This is because in NASM you use Intel syntax wile in the GNU AS you use AT\amp;T syntax. The next section describes the most important differences between them.

Intel vs. AT&T assembly syntax


  • Comments GNU AS supports two comment styles:
    • Multi-line comments. As in C multi-line comments start and end with mirroring slash-asterisk pairs:
      
      /* 
      comment
      */
      
    • For single-line comments on the platforms: i386, x86-64 you use the hash symbol (#).
  • Register name Register names are prefixed with %. To reference EAX:

    
    AT&T:  %eax
    Intel: eax
    
  • Source/Destination order In AT&T syntax the source is on the left, and the destination is on the right -- opposite to the Intel syntax. To load EBX with the value in EAX:

    
    AT&T:  movl %eax, %ebx
    Intel: mov ebx, eax
    
  • Constant value/immediate value format Constant/immediate values are prefixed with $. To load EAX with the address of the variable foo:

    
    AT&T:  movl $foo, %eax
    Intel: mov eax, foo
    
    To load EBX with 1:
    
    AT&T:  movl $1, %ebx
    Intel: mov ebx, 1
    
  • Operator size specification In case of GNU AS the instruction must be specified with one of b, w, or l to specify the width of the destination register as a byte, word or longword (double word). If you omit this, GNU AS will attempt to guess but it may do this incorrectly. The only way to know about mistake is during execution of your code which may be very difficult to diagnose, so better use these specifiers.

    
    AT&T:  movw %ax, %bx
    Intel: mov bx, ax
    
  • Referencing memory Here is the canonical format for 32-bit addressing:

    
    AT&T:  immed32(basepointer,indexpointer,indexscale)
    Intel: [basepointer + indexpointer*indexscale + immed32]
    
    The formula to calculate the address is:
    
    immed32 + basepointer + indexpointer * indexscale
    
    You don't have to use all those fields, but you have to use at least one of immed32 or basepointer. For example:

    • Addressing a particular variable:
      
      AT&T:  foo
      Intel: [foo]
      
    • Addressing what a register points to:
      
      AT&T:  (%eax)
      Intel: [eax]
      
    • Addressing a variable offset by a value in a register
      
      AT&T: variable(%eax)
      Intel: [eax + variable]
      
    • Addressing a value in an array of integers (scaling up by 4):
      
      AT&T:  array(,%eax,4)
      Intel: [eax*4 + array]
      
    • Offsets with the immediate value:
      
      AT&T:  1(%eax)
      Intel: [eax + 1]
      
    • Addressing a particular char(acter) in an array of 8-character records (EAX holds the number of the record desired, EBX has the wanted char's offset within the record):
      
      AT&T:  array(%ebx,%eax,8)
      Intel: [ebx + eax*8 + array]
      


  • The table below summarizes all major differences between Intel and AT&T syntax:

    
    +------------------------------+------------------------------------+
    |       Intel Code             |      AT&T Code                     |
    +------------------------------+------------------------------------+
    | mov     eax,1                |  movl    $1,%eax                   |
    | mov     ebx,0ffh             |  movl    $0xff,\%ebx               |
    | int     80h                  |  int     $0x80                     |
    | mov     ebx, eax             |  movl    %eax, %ebx                |
    | mov     eax,[ecx]            |  movl    (%ecx),%eax               |
    | mov     eax,[ebx+3]          |  movl    3(%ebx),%eax              | 
    | mov     eax,[ebx+20h]        |  movl    0x20(%ebx),%eax           |
    | add     eax,[ebx+ecx*2h]     |  addl    (%ebx,%ecx,0x2),%eax      |
    | lea     eax,[ebx+ecx]        |  leal    (%ebx,%ecx),%eax          |
    | sub     eax,[ebx+ecx*4h-20h] |  subl    -0x20(%ebx,%ecx,0x4),%eax |
    +------------------------------+------------------------------------+
    


    Making (pseudo) 64-bit code on 64-bit system with GNU AS


    TODO

    64-bit basic stand alone program


    Code for NASM


    
    ;  This program demonstrates basic text output to a screen.
    ;  No "C" library functions are used.
    ;  Calls are made to the operating system directly.
    ;
    ; assemble:     nasm -f elf64 hello64.asm
    ; link:         ld hello64.o -o hello64
    ; run:          ./hello64 
    ; output is:    Hello World 
     
    section .data              ; Data section
    
    text:   db "Hello World!", 10  ; The string to print, 10=LF
    len:    equ $-text         ; "$" means "here"
                               ; len is a value, not an address
     
    section .text              ; Code section
    
    global  _start             ; Make label available to linker
                               ; We must export the entry point to the ELF linker or
                               ; loader. They conventionally recognize _start as their
                               ; entry point. Use ld -e foo to override the default.
     
    _start:                    ; Standard  ld  entry point
            mov     rdx, len   ; arg3: length of string to print
            mov     rsi, text  ; arg2: pointer to string
            mov     rdi, 1     ; arg1: where to write, so called file descriptor
                               ; in this case stdout (screen)
            mov     rax, 1     ; System call number (sys_write)
            syscall            ; Call a system function
     
    ; Exit
            mov     rdi, 0     ; Exit code, 0=normal
            mov     rax, 60    ; System call number (sys_exit)
            syscall            ; Call a system function
    ; End of the code
    
    Verify correctness of the code by assembling it:
    
    nasm -f elf64 hello_64.asm -o hello_64.o
    
    linking:
    
    ld hello_64.o -o hello_64
    
    and finally running:
    
    fulmanp@fulmanp-k2:~/assembler$ ./hello_64
    Hello World!
    
    For the explanation of the code, see description of the code in the preceding section Explain the code.

    Notice that taking code from section 32-bit basic stand alone program and replacing all 32-bit registers with 64-bit equivalents (e.g. replacing EAX with RAX), and even compiling it as 64-bit program the result you obtain is not a real 64-bit program as it was mentioned in the section Making (pseudo) 64-bit code on 64-bit system with NASM.

    Code for GNU AS


    TODO

    At this moment you should have na idea what differs 32-bit code from 64-bit code. If you still need more informations regarding this topic I recommend you nice thread at stackoverflow ([stov_001])

    Multiple files


    Imagine that you want distribute your code across many files, like this:

    
    File 1: routines.asm
    
    os_return:
        ;some code to return to os
    do_something:
        ;some code to do something
    
    File 2: useRoutines.asm
    
    main:
       call do_something ; call function from separate file to do something
       ... maybe do something else here ...
       call os_return    ; call function from separate file to finish program
    
    You can do this quite naturally:

    File routines.asm:
    
    section .data
    
    strHello    db  "Hello", 10
    strLen      equ $ - strHello
    
    sys_exit    equ 1
    sys_write   equ 4
    stdout      equ 1
    
    section .text
    
    global do_something
    global exit
    
    do_something:
        mov     edx, strLen
        mov     ecx, strHello
        mov     eax, sys_write
        mov     ebx, stdout
        int     0x80   
        ret
            
    exit:
        mov     eax, sys_exit
        xor     ebx, ebx
        int     0x80
        ret
    


    File useRoutines.asm:
    
    section .text
    
    extern do_something 
    extern exit
    global _start 
    
    _start:
        call    do_something
        call    exit
    
    Having separate file you can compile them, link and run almost as you do for single file:
    
    fulmanp@fulmanp-k2:~/assembler$ nasm -f elf -o routines.o routines.asm 
    fulmanp@fulmanp-k2:~/assembler$ nasm -f elf -o useRoutines.o useRoutines.asm 
    fulmanp@fulmanp-k2:~/assembler$ ld -m elf_i386 -o testSeparateRoutines routines.o useRoutines.o
    fulmanp@fulmanp-k2:~/assembler$ ./testSeparateRoutines 
    Hello
    
    If you want to use GCC to link your code, you have to change it a little bit in useRoutines.asm:

    File useRoutines_for_gcc.asm:
    
    section .text
        
    extern do_something 
    extern exit
    global main 
    
    main:
        call    do_something
        call    exit
    
    
    fulmanp@fulmanp-k2:~/assembler$ nasm -f elf -o routines.o routines.asm 
    fulmanp@fulmanp-k2:~/assembler$ nasm -f elf -o useRoutines_for_gcc.o useRoutines_for_gcc.asm 
    fulmanp@fulmanp-k2:~/assembler$ gcc -m32 -o testSeparateRoutine routines.o useRoutines_for_gcc.o
    fulmanp@fulmanp-k2:~/assembler$ ./testSeparateRoutine
    Hello