fulmanski.pl: tutorials

Chapter 5

Combining your assembler with C routines

Initial version: 2025-03-07
Last update: 2025-03-26

Writing your own code to do something, for example print numbers, is good for your knowledge and understanding of topic but resembles, in some aspect, reinventing the wheel. You do something what has been done before, you write the code which has been written and now is ready to use. You spend a lot of time that could be spent on something, perhaps, more creative, something that will allow you to produce not code that does the same thing as another existing code, but code that does something that no existing code does yet.

For this reason in this chapter you will learn how to use functions from C programming language in your assembler code.

Table of contents

Function calling conventions
- GCC 32-bit calling conventions
- GCC 64-bit calling conventions
  - `RAX` value for variable-argument subroutines
First program linked with a C library
- 64-bit basic program linked with a C library
  - The code
  - Making 64-bit program linked with a C library
- 32-bit basic program linked with a C library
Peeking GCC generated assembler
Problems you can try to solve
- Problem 1: command line arguments
- Problem 2: Use `scanf` to read integer or float from user

Function calling conventions

To use any function, in most cases, you have to pass some arguments. It is a topic of the preceding chapter Second program. As you know from that chapter you have few options to do it:

you can use memory at well known addresses or with well known labels,
you can use specific registers,
you can use stack.

Using well known labels has the disadvantage that you will fall into troubles when try to use two different functions using exactly the same labels for their parameters.

Specific registers is the fastest method, however you may have problems when their number is not enough.

The stack is the most versatile approach but of limited speed because it requires an access to the memory.

From the above you see that there is no one obvious method and everything depends on details and designer decisions.

Writing assembly language functions that will link with C, and use GCC, you must obey the GCC calling conventions. You will use different tool – the convention may be different, because calling convention is just a convention and rarely is forced by architecture, rather by designer's choices. For this reason, as didactically instructive in my opinion, is to learn about two different calling convention for x86 architecture (on Linux operating system). One is specific for 32-bit systems, second for 64-bit systems.

GCC 32-bit calling conventions

Parameters are pushed on the stack, right to left (last parameter as the first on the stack), and are removed by the caller after the call.
After the parameters are pushed, the CALL instruction is executed, so when the called function gets control, the return address is at the top of the stack at ESP, the first parameter is at ESP + 4, etc. You use +4 value because the system is 32-bit system (32 = 4 * byte).
Using any of the following registers: EBX, ESI, EDI, EBP, DS, ES and SS you must save and restore their values. In other words, these values must not change across function calls. In consequence when you make calls of a function written by some other programmer, you can assume these will not change (as long as everyone plays by the rules).
A function that returns an integer value should return it in EAX, a 64-bit integer in EDX:EAX, and a floating point value should be returned on the FPU stack top (more about FPU in next chapter).

GCC 64-bit calling conventions

The most important points of the 64-bit calling conventions differs from 32-bit calling conventions and are different for different operating systems. For 64-bit Linux you have:

Parameters are passing from left to right and as many parameters as will fit in registers. The order in which registers are allocated, are:
- For integers and pointers: RDI (first parameter of the function), RSI, RDX, RCX, R8, R9.
- For floating-points (floats, doubles): XMM0, XMM1, XMM2, XMM3, XMM4, XMM5, XMM6, XMM7.
If needed, additional parameters are pushed on the stack, right to left, and are removed by the caller after the call.
After the parameters are pushed, the call instruction is made, so when the called function gets control, the return address is at ESP, the first memory parameter is at ESP + 8, etc.
Variable-argument subroutines require a value in RAX for the number of vector registers used. In other words when a function taking variable-arguments is called, RAX must be set to the total number of floating point parameters passed to the function in vector registers. See below for more explanation.
The only registers that the called function is required to preserve (the calle-save registers) are: RBP, RBX, R12, R13, R14, R15. All others are free to be changed by the called function.
The callee is also supposed to save the control bits of the XMCSR and the x87 control word.
Integers are returned in RAX or RDX:RAX, and floating point values are returned in XMM0 or XMM1:XMM0.

`RAX` value for variable-argument subroutines

In the x86_64 ABI (ABI – application binary interface – it is an interface between two binary program modules; an ABI defines how data structures or computational routines are accessed in machine code, in hardware-dependent format; the calling convention – determines how data is provided as input to, or read as output from computational routines – is a common aspect of an ABI [abi_01]), if a function has variable arguments then AL (which is part of EAX) is expected to hold the number of vector registers used to hold arguments to that function. For example:


printf("%d", 1);

has an integer argument so there’s no need for a vector register, hence AL is set to 0. On the other hand, if we change this example to:


printf("%f", 1.0f);

then the floating-point literal is stored in a vector register and, correspondingly, AL (EAX) is set to 1 (meaning of rip is explained below) (complete result c source, assembler):


movq  .LC0(%rip), %rax
movq  %rax, %xmm0
leaq  .LC1(%rip), %rax
movq  %rax, %rdi
movl  $1, %eax
call  printf@PLT

As you may expect the code:


printf("%f %f", 1.0f, 2.0f);

will cause the compiler to set AL (EAX) to 2, since there are two floating-point arguments (complete result c source, assembler):


movsd   .LC0(%rip), %xmm0
movq    .LC1(%rip), %rax
movapd  %xmm0, %xmm1
movq    %rax, %xmm0
leaq    .LC2(%rip), %rax
movq    %rax, %rdi
movl    $2, %eax
call    printf@PLT

As you can see I took all the above examples from the compiled code (see next subsection for more details how to do that), so they are not an effect of my "vision" how to do this but you see exactly how it really works.

RIP, relative addressing and position independent code X86-64 defines a new instruction pointer (RIP) relative addressing mode to simplify writing of !position-independent code! (see below). It can be use for global variables only. To encode this addressing, just write rip as yet another register. The instruction of the form:


movl $0x1, 0x10(%rip)

will store the value 0x1 10 bytes after the end of the (currently processed) instruction. Symbolic relocation will be implicitly RIP relative, so:


movl $0x1, symb(%rip)

will write 0x1 to the address of symbol symb.

This looks particularly confusing in the Intel syntax [symb+rip] suggest different location than [symb] which in fact means to calculate a rel32 displacement to reach symbol, not RIP + symbol value:

[rip + 10] (equivalent of AT&T 10(%rip)) means 10 bytes past the end of this instruction;
[rip + symbol] (equivalent of AT&T a(%rip)) means to calculate a displacement to reach symbol.

Position-independent code

Position-independent code (PIC) or position-independent executable (PIE) is a body of machine code that executes properly regardless of its memory address. PIC is commonly used for shared libraries, so that the same library code can be loaded at a location in each program's address space where it does not overlap with other memory in use by, for example, other shared libraries. PIC was also used on older computer systems that lacked an MMU, so that the operating system could keep applications away from each other even within the single address space of an MMU-less system.

Position-independent code can be executed at any memory address without modification. This differs from absolute code, which must be loaded at a specific location to function correctly, and load-time locatable (LTL) code, in which a linker or program loader modifies a program before execution, so it can be run only from a particular memory location.

If you want to know more you can read [tnov] or very good but maybe too complicated at first reading [exe_pack].

First program linked with a C library

64-bit basic program linked with a C library

The code


extern  printf          ; The C function, to be called

global  main            ; Make label available to linker

section .data           ; Data section

text:    db "Hello World!", 10, 0  ; The string to print, 10=cr, 0=null
                        ; null terminated string have to be used
                        ; in order to use printf function
 
section .text           ; Code section
 
main:                   ; Standard gcc entry point
        mov  rdi, text  ; 64-bit ABI passing order: RDI, RSI, ...
        mov  rax, 0     ; printf is varargs, so RAX counts # of non-integer
                        ; arguments being passed
        call printf     ; The C function, to be called

; Exit
        mov  rax,0      ; Normal, no error, return value
        ret             ; Return
; End of the code

Making 64-bit program linked with a C library

Years ago I compiled this code with:


fulmanp@fulmanp-ThinkPad-T540p:~$ nasm -f elf64 hello_c_64.asm -o hello_c_64.o
fulmanp@fulmanp-ThinkPad-T540p:~$ gcc hello_c_64.o -o hello_c_64
fulmanp@fulmanp-ThinkPad-T540p:~$ ./hello_c
Hello World!

Now I obtain an error:


fulmanp@fulmanp-ThinkPad-T540p:~$ nasm -f elf64 hello_c_64.asm -o hello_c_64.o
fulmanp@fulmanp-ThinkPad-T540p:~$ gcc hello_c_64.o -o hello_c_64
/usr/bin/ld: warning: hello_c_64.o: missing .note.GNU-stack section implies executable stack
/usr/bin/ld: NOTE: This behaviour is deprecated and will be removed in a future version of the linker
/usr/bin/ld: hello_c_64.o: warning: relocation in read-only section `.text'
/usr/bin/ld: hello_c_64.o: relocation R_X86_64_PC32 against symbol `printf@@GLIBC_2.2.5' can not be used when making a PIE object; recompile with -fPIE
/usr/bin/ld: final link failed: bad value
collect2: error: ld returned 1 exit status

You can fix this in two ways:

Solution 1 Use -no-pie GCC flag:


fulmanp@fulmanp-ThinkPad-T540p:~$ nasm -f elf64 hello_c_64.asm -o hello_c_64.o
fulmanp@fulmanp-ThinkPad-T540p:~$ gcc hello_c_64.o -o hello_c_64 -no-pie
/usr/bin/ld: warning: hello_c_64.o: missing .note.GNU-stack section implies executable stack
/usr/bin/ld: NOTE: This behaviour is deprecated and will be removed in a future version of the linker
fulmanp@fulmanp-ThinkPad-T540p:~$ ./hello_c_64 
Hello World!

Solution 2 Change how you call the function: use relative addressing:


bits    64              ; Directive instructs about the current mode
default rel             ; Instructs to use relative addressing

extern  printf          ; The C function, to be called

[... CUT THIS CODE FOR BETTER READABILITY ...]
 
main:                   ; Standard gcc entry point
        lea  rdi, [text]  ; 64-bit ABI passing order: RDI, RSI, ...
        mov  rax, 0     ; printf is varargs, so RAX counts # of non-integer
                        ; arguments being passed
        call printf wrt ..plt    ; The C function, to be called

; Exit
[... CUT THIS CODE FOR BETTER READABILITY ...]

In 16- and 32-bit architectures the usual addressing mode was absolute: you used syntax MOV reg, address, and the address was a constant 16- or 32-bit number. With 64 bits typically the libraries and executables are compiled in position-independent mode, which requires relative addressing. Instead of treating addresses as 64 bit numbers, all addresses are computed by the assembler/compiler as a difference between current address and the location in memory. Because of the fact that position-independent code gets loaded to any locations in memory, when your executable is loaded it is not aware of the addresses (absolute or relative) of the external functions (e.g. printf). In NASM if you write call printf wrt ..plt, it means that the call will actually jump to PLT (Procedure Linkage Table). The PLT contains a lazy lookup routine i.e. it will take the address of the function from GOT (Global Offset Table) if known or it will load it into GOT during the first call.

With the LEA instruction you compute the effective address of the second operand (the source operand) and stores it in the first operand (destination operand) which is a general-purpose register.


fulmanp@fulmanp-ThinkPad-T540p:~$ nasm -f elf64 hello_c_64_rel.asm -o hello_c_64_rel.o
fulmanp@fulmanp-ThinkPad-T540p:~$ gcc hello_c_64_rel.o -o hello_c_64_rel
/usr/bin/ld: warning: hello_c_64_rel.o: missing .note.GNU-stack section implies executable stack
/usr/bin/ld: NOTE: This behaviour is deprecated and will be removed in a future version of the linker

32-bit basic program linked with a C library

Today its highly unlikely you will work on 32-bit system. It is also very difficult to present every code in two versions: for 32-bit and 64-bit system. For this reason I will mostly focus on 64-bit code. However if for some reason you want to work with 32-bit code you can do this with the following commands:


fulmanp@fulmanp-ThinkPad-T540p:~$ nasm -f elf hello.asm -o hello.o
fulmanp@fulmanp-ThinkPad-T540p:~$ gcc -m32 hello.o -o hello -no-pie
/usr/bin/ld: warning: hello.o: missing .note.GNU-stack section implies executable stack
/usr/bin/ld: NOTE: This behaviour is deprecated and will be removed in a future version of the linker
fulmanp@fulmanp-ThinkPad-T540p:~$ ./hello 
status word value 14340

It is probable that issuing gcc command you will see an error:


fulmanp@fulmanp-ThinkPad-T540p:~$ gcc -m32 fpu_test_01_32.o -o fpu_test_01_32
/usr/bin/ld: cannot find Scrt1.o: No such file or directory
/usr/bin/ld: cannot find crti.o: No such file or directory
/usr/bin/ld: skipping incompatible /usr/lib/gcc/x86_64-linux-gnu/13/libgcc.a when searching for -lgcc
/usr/bin/ld: cannot find -lgcc: No such file or directory
/usr/bin/ld: skipping incompatible /usr/lib/gcc/x86_64-linux-gnu/13/libgcc.a when searching for -lgcc
/usr/bin/ld: cannot find -lgcc: No such file or directory
collect2: error: ld returned 1 exit status
fulmanp@fulmanp-ThinkPad-T540p:~/Desktop/as

The problem is you likely only have the gcc for your current architecture and that's 64bit. You need the 32bit support files. For that, you need to install them. In case of Debian / Ubuntu you do this with the command:


sudo apt install gcc-multilib

Peeking GCC generated assembler

Sometimes, when you drop into troubles writing your own assembler code, it's very helpful to inspect code (working code) generated by some tools, like GCC. Consider the following code simple_printf_64.c:


#include 

int main()
{
  double  flt1=1.234e-3;

  printf("printf float=%e\n", flt1);
  return 0;
}

To get assembler code you can type:


fulmanp@fulmanp-ThinkPad-T540p:~$ gcc -S simple_printf_64.c -o simple_printf_64_dis.s

The output file will use AT&T syntax:


  .file  "simple_printf_64.c"
  .section  .rodata
.LC1:
  .string  "printf float=%e\n"
  .text
  .globl  main
  .type  main, @function
main:
.LFB0:
  .cfi_startproc
  pushq  %rbp
  .cfi_def_cfa_offset 16
  .cfi_offset 6, -16
  movq  %rsp, %rbp
  .cfi_def_cfa_register 6
  subq  $16, %rsp
  movabsq  $4563333643445681349, %rax
  movq  %rax, -8(%rbp)
  movl  $.LC1, %eax
  movsd  -8(%rbp), %xmm0
  movq  %rax, %rdi
  movl  $1, %eax
  call  printf
  movl  $0, %eax
  leave
  .cfi_def_cfa 7, 8
  ret
  .cfi_endproc
.LFE0:
  .size  main, .-main
  .ident  "GCC: (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3"
  .section  .note.GNU-stack,"",@progbits

To get code compatible with Intel syntax use:


fulmanp@fulmanp-ThinkPad-T540p:~$ gcc -S -masm=intel simple_printf_64.c -o simple_printf_64_dis.asm

Resulting file is exactly the same as in previous call except the syntax:


  .file  "simple_printf_64.c"
  .intel_syntax noprefix
  .section  .rodata
.LC1:
  .string  "printf float=%e\n"
  .text
  .globl  main
  .type  main, @function
main:
.LFB0:
  .cfi_startproc
  push  rbp
  .cfi_def_cfa_offset 16
  .cfi_offset 6, -16
  mov  rbp, rsp
  .cfi_def_cfa_register 6
  sub  rsp, 16
  movabs  rax, 4563333643445681349
  mov  QWORD PTR [rbp-8], rax
  mov  eax, OFFSET FLAT:.LC1
  movsd  xmm0, QWORD PTR [rbp-8]
  mov  rdi, rax
  mov  eax, 1
  call  printf
  mov  eax, 0
  leave
  .cfi_def_cfa 7, 8
  ret
  .cfi_endproc
.LFE0:
  .size  main, .-main
  .ident  "GCC: (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3"
  .section  .note.GNU-stack,"",@progbits

Another method is to dissasembly a compiled file:


fulmanp@fulmanp-ThinkPad-T540p:~$ gcc simple_printf_64.c -o simple_printf_64_dis
fulmanp@fulmanp-ThinkPad-T540p:~$ objdump -d --disassembler-options=intel simple_printf_64_dis

simple_printf_64_dis:     file format elf64-x86-64


Disassembly of section .init:

[... cut ...]

00000000004004f4 :
  4004f4: 55                    push   rbp
  4004f5: 48 89 e5              mov    rbp,rsp
  4004f8: 48 83 ec 10           sub    rsp,0x10
  4004fc: 48 b8 c5 3c 2b 69 c5  movabs rax,0x3f5437c5692b3cc5
  400503: 37 54 3f 
  400506: 48 89 45 f8           mov    QWORD PTR [rbp-0x8],rax
  40050a: b8 1c 06 40 00        mov    eax,0x40061c
  40050f: f2 0f 10 45 f8        movsd  xmm0,QWORD PTR [rbp-0x8]
  400514: 48 89 c7              mov    rdi,rax
  400517: b8 01 00 00 00        mov    eax,0x1
  40051c: e8 cf fe ff ff        call   4003f0 
  400521: b8 00 00 00 00        mov    eax,0x0
  400526: c9                    leave  
  400527: c3                    ret    
  400528: 90                    nop
  400529: 90                    nop
  40052a: 90                    nop
  40052b: 90                    nop
  40052c: 90                    nop
  40052d: 90                    nop
  40052e: 90                    nop
  40052f: 90                    nop

[... cut ...]

Problems you can try to solve

Problem 1: command line arguments

Your task is to print informations about the number of command line arguments passed to your program as well as the arguments itself. At the first sight it may seem to be a task required some special knowledge unless you realise that in C execution of your program means a call of the main function of the following signature:


int main(int argc, char* argv[])
void

Other words, main is called as all other functions, so if you know calling conversation you should be able to get values of argv and argc.

Problem 2: Use `scanf` to read integer or float from user

In higher level programming language (C) this would be:

integer case


#include 

int main() {
    int n;

    scanf("%d", &n); 
    printf("%d", n);
    return 0;
}

float case


#include 

int main() {
    double n;

    scanf("%lf", &n); 
    printf("%lf", n);
    return 0;
}

If you succeed with this you can try to read data to an array:

integer case


#include 

int main() {
    int n[2];

    scanf("%d %d", &n[0], &n[1]); 
    printf("%d %d", n[0], n[1]);
    return 0;
}

float case


#include 

int main() {
    double n[2];

    scanf("%lf %lf", &n[0], &n[1]); 
    printf("%lf %lf", n[0], n[1]);
    return 0;
}