fulmanski.pl: tutorials

Chapter 3

NASM

Initial version: 2025-02-19
Last update: 2025-02-19

In this chapter you will learn the most important syntax allowing you to work with NASM.

Table of contents

Layout of a NASM source line
Pseudo-instructions
Effective addresses
Constants

Content of this chapter is a shortcut of official documentation [nasm_doc].

Layout of a NASM source line

Each NASM source line contains (unless it is a macro, a preprocessor directive or an assembler directive) some combination of the four fields


label:    instruction operands        ; comment

The presence or absence of any combination of a label, an instruction and a comment is allowed. Of course, the operand field is either required or forbidden by the presence and nature of the instruction field.

NASM uses backslash (\) as the line continuation character; if a line ends with backslash, the next line is considered to be a part of the backslash-ended line.

An identifier may also be prefixed with a dollar character ($) to indicate that it is intended to be read as an identifier and not a reserved word.

Almost any floating-point instruction that references memory must use one of the prefixes DWORD, QWORD or TWORD to indicate what size of memory operand it refers to.


mov eax,[vec1+ecx*4]   ; Load [ecx] component of vector1
imul dword[vec2+ecx*4] ; Multiply eax by [ecx] component of vector2
                       ; Notice that we have to specify the size
                       ; of memory operand it refers to (dword in this case).

Pseudo-instructions

Pseudo-instructions are things which, though not real x86 machine instructions, are used in the instruction field anyway because that's the most convenient place to put them.

Declaring initialized data

NASM defines number of pseudo-instructions to declare initialized data in the output file.


      db    0x55                ; just the byte 0x55 
      db    0x55,0x56,0x57      ; three bytes in succession 
      db    'a',0x55            ; character constants are OK 
      db    'hello',13,10,'$'   ; so are string constants 
      dw    0x1234              ; 0x34 0x12 
      dw    'a'                 ; 0x61 0x00 (it's just a number) 
      dw    'ab'                ; 0x61 0x62 (character constant) 
      dw    'abc'               ; 0x61 0x62 0x63 0x00 (string) 
      dd    0x12345678          ; 0x78 0x56 0x34 0x12 
      dd    1.234567e20         ; floating-point constant 
      dq    0x123456789abcdef0  ; eight byte constant 
      dq    1.234567e20         ; double-precision float 
      dt    1.234567e20         ; extended-precision float
      dt    3.14159265358979323 ; pi

Declaring uninitialized data

NASM defines number of pseudo-instructions to declare uninitialized data. Each takes a single operand, which is the number of bytes, words, doublewords or whatever to reserve and are designed to be used in the .bss section of a module.

The BSS, block starting symbol, is the portion of an object file, executable, or assembly language code that contains statically allocated variables that are declared but have not been assigned a value yet. Typically only the length of the .bss section, but no data, is stored in the object file. The program loader allocates memory for the .bss section when it loads the program. By placing variables with no value in the .bss section, instead of the .data section which require initial value data, the size of the object file is reduced.


buffer:         resb    64              ; reserve 64 bytes 
wordvar:        resw    1               ; reserve a word 
realarray:      resq    10              ; array of ten reals 
ymmval:         resy    1               ; one YMM register 
zmmvals:        resz    32              ; 32 ZMM registers

Including external binary files

The INCBIN pseudo-instruction includes a binary file verbatim into the output file. It can be called in one of these three ways:


    incbin  "file.dat"             ; include the whole file 
    incbin  "file.dat",1024        ; skip the first 1024 bytes 
    incbin  "file.dat",1024,512    ; skip the first 1024, and 
                                   ; actually include at most 512

Defining constants

The EQU defines a symbol to a given constant value: when EQU is used, the source line must contain a label. The action of EQU is to define the given label name to the value of its (only) operand. This definition is absolute, and cannot change later. For example:


message         db      'hello, world' 
msglen          equ     $-message

NASM supports two special tokens in expressions, allowing calculations to involve the current assembly position: the $ and $$ tokens. $ evaluates to the assembly position at the beginning of the line containing the expression; so you can code an infinite loop using JMP $. $$ evaluates to the beginning of the current section; so you can tell how far into the section you are by using ($-$$). In the above example $-message evaluates to 12:


message         db      'hello, world' ; Say buffer address is 100
                                       ; buffer ends at 111 (12 chars)
msglen          equ     $-message      ; 112-100=12

Repeating instructions or data

The TIMES prefix causes the instruction to be assembled multiple times.


zerobuf:        times 64 db 0

The argument to TIMES is not just a numeric constant, but a numeric expression, so you can do things like:


buffer: db      'hello, world'   ; Say buffer address is 100
                                 ; buffer ends at 111 (12 chars)
times 64-$+buffer db ' '         ; 64-112+100 = 52

which will store exactly enough spaces to make the total length of buffer up to 64.

Effective addresses

An effective address is any operand to an instruction which references memory. Effective addresses, in NASM, have a very simple syntax: they consist of an expression evaluating to the desired address, enclosed in square brackets. For example:


wordvar dw      123 
        mov     ax,[wordvar] 
        mov     ax,[wordvar+1] 
        mov     ax,[es:wordvar+bx]

More complicated effective addresses, such as those involving more than one register, work in exactly the same way:


        mov     eax,[ebx*2+ecx+offset] 
        mov     ax,[bp+di+8]  
        mov     eax,[ebx+8,ecx*4]   ; ebx=base, ecx=index, 4=scale, 8=displacement

Constants

Numeric constants

A numeric constant is simply a number. NASM allows you to specify numbers in a variety of number bases, in a variety of ways: you can suffix H or X, D or T, Q or O, and B or Y for hexadecimal, decimal, octal and binary respectively. NASM accept the prefix 0h for hexadecimal, 0d or 0t for decimal, 0o or 0q for octal, and 0b or 0y for binary. Numeric constants can have underscores (_) interspersed to break up long strings.

Some examples (all producing exactly the same code):


        mov     ax,200          ; decimal 
        mov     ax,0200         ; still decimal 
        mov     ax,0200d        ; explicitly decimal 
        mov     ax,0d200        ; also decimal 
        mov     ax,0c8h         ; hex 
        mov     ax,$0c8         ; hex again: the 0 is required 
        mov     ax,0xc8         ; hex yet again 
        mov     ax,0hc8         ; still hex 
        mov     ax,310q         ; octal 
        mov     ax,310o         ; octal again 
        mov     ax,0o310        ; octal yet again 
        mov     ax,0q310        ; octal yet again 
        mov     ax,11001000b    ; binary 
        mov     ax,1100_1000b   ; same binary constant 
        mov     ax,1100_1000y   ; same binary constant once more 
        mov     ax,0b1100_1000  ; same binary constant yet again 
        mov     ax,0y1100_1000  ; same binary constant yet again

String constants

String constants are character strings used in the context of some pseudo-instructions, namely the DB family and INCBIN (where it represents a filename.) They are also used in certain preprocessor directives. A string constant looks like a character constant, only longer.

The following are equivalent:


      db    'hello'               ; string constant 
      db    'h','e','l','l','o'   ; equivalent character constants      
      dd    'ninechars'           ; doubleword string constant 
      dd    'nine','char','s'     ; becomes three doublewords 
      db    'ninechars',0,0,0     ; and really looks like this

Floating-point constants

Floating-point constants are acceptable only as arguments to DB, DW, DD, DQ, DT, and DO, or as arguments to the special operators __?float8?__, __?float16?__, __?float32?__, __?float64?__, __?float80m?__, __?float80e?__, __?float128l?__, and __?float128h?__.

Some examples:


      db    -0.2                    ; "Quarter precision" 
      dw    -0.5                    ; IEEE 754r/SSE5 half precision 
      dd    1.2                     ; an easy one 
      dd    1.222_222_222           ; underscores are permitted 
      dd    0x1p+2                  ; 1.0x2^2 = 4.0 
      dq    0x1p+32                 ; 1.0x2^32 = 4 294 967 296.0 
      dq    1.e10                   ; 10 000 000 000.0 
      dq    1.e+10                  ; synonymous with 1.e10 
      dq    1.e-10                  ; 0.000 000 000 1 
      dt    3.141592653589793238462 ; pi 
      
      mov    rax,__?float64?__(3.141592653589793238462)