Let there be spawn

Say you want to run a program in Linux. You might think that there is a system call (the essential interface between programs and the operating system) like spawn_process or start_process for starting a new process. Not quite.

The functionality that one could consider “starting a process” is actually split into two distinct phases, and realized by two distinct families of system calls.

fork and exec

In Linux, every process besides the root initialization process (systemd in most current Linux distributions) is initially forked or cloned from its parent process. Clones… clones everywhere!

These cloned processes are subsequently initialized with a call to one of the exec family of system calls in order to run the new program. Fork creates the clone children, and exec makes them eat their brains.

These brain eating clone children become zombie processes once they complete their execution. They’ll linger around until the return value is returned to the parent process via the wait system call, or until the root process cleans up the zombie if the parent process is no longer running.

To fork or to clone?

Fork is the original UNIX system call used to fork/clone a process. It is still available in modern Linux system, but it has been reimplemented as a delegation to a new system call clone with a particular set of flags. The clone system call is also used to create new threads within a process. More on differences between processes and threads in a future post.

Here’s a little C program that will fork the current process and call execve to run the “ls” command. We’ll run strace to trace the system calls that get invoked.

# fork_exec.c

#include "stdio.h"

int main() {
    int pid = fork();
    if (pid == 0) {
        // This is the child process
        printf("I'm the child process, pid=%d\n", getpid());

        // Exec the "ls" command
        char* const argv[] = {"/bin/ls", NULL};
        char* const envp[] = {NULL};
        execve("/bin/ls", argv, envp);
    } else if (pid == -1) {
        // Oh crap, our fork() failed!
    } else {
        // This is the parent process
        printf("I'm the parent process, pid=%d\n", getpid());
        wait();
    }
}
$ gcc -o fork_exec fork_exec.c
$ strace -f ./fork_exec
execve("./fork_exec", ["./fork_exec"], [/* 38 vars */]) = 0
...
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x76fc9068) = 22046
...
[pid 22046] execve("/bin/ls", ["/bin/ls"], [/* 0 vars */]) = 0
...
[pid 22046] +++ exited with 0 +++
...
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=22046, si_uid=1000, si_status=0, si_utime=0, si_stime=0} ---

As you can see, when we call fork() to fork the parent process, the clone system call is actually what is being utilized. We can also see that when the process exits, it sends a SIGCHLD signal to the parent process in case it wants to do something about it. More on signals and signal handling in a future post!

Isn’t cloning the current process really expensive?

Not usually. Generally speaking, fork is implemented with “copy-on-write” behavior. Copy-on-write is an optimization technique which defers the actual copying of memory pages and shares data from the parent process until the child attempts to modify its memory.

In the case of fork() followed by an exec(), this optimization generally avoids the overhead of needless copying.

Why does this matter?

As we saw last post, processes have a lot of attributes, such as open files, environment variables, signal handlers, ownership information, memory, etc. When execve is invoked, most of these are inherited from the parent process. These factors matter and may effect what happens when you run your program!