Running glibc applications on musl libc Void Linux installation

1 January 2021

#linux #musl #voidlinux

Some time ago I got tired of Gentoo and decided to try Void Linux. They officially support musl libc, so I thought why not? and installed it on my main working machine.

Everything was great, I was (and I am) really impressed by Void and I doubt I will go back to Gentoo anytime soon. But there was one little thing I haven't thought of at the time of installation: proprietary software.

Proprietary software for Linux is almost always glibc-linked and nobody provides builds for alternative libc implementations such as musl. I wanted to run at least Vivaldi browser and JetBrains IDEs (PhpStorm, CLion, PyCharm, Android Studio etc.) so I started looking for solutions. (Of course I could just go back and do a glibc reinstall, but where's fun in that?)

Void documentation suggests to perform a glibc base system installation via base-bootstrap to a separate directory (I'll be referring to it as the glibc container), chroot to it and run glibc binaries from within this chroot. Using chroot seemed too primitive and inconvenient, even if I wrote scripts to automate all steps, so I kept looking.

Then I found this blog post discussing another way, using mount_namespaces(7) as an elegant alternative to chroot. The idea is simple: create new mount namespace, bind-mount the /usr from the glibc container at your real /usr, thus substituting your real /usr with "fake" /usr from the glibc container where all binaries and libraries are glibc-linked, and launch your glibc program inside this mount namespace. The mounts will be invisible for the rest of the system. Cool, isn't it?

The code in the blog post wasn't anywhere "production-ready", though, so I decided to write a new ready-to-use program based on this idea.

First attempt: writing voidnsrun

To use mount namespaces, one needs root privileges. It's possible to use user namespaces instead, but they break setuid binaries (or maybe I just don't know how to prepare them properly), so I decided to stick to classic old mount namespaces. To be able to use them, you either need to be root or the binary must be owned by root and have setuid bit. We'll do the latter, but remember that setuid in general is a security risk and one have to be very careful when writing setuid programs.

Disclaimer: all code in this article is only to demonstrate the ideas. In real programs you need to check return values of almost all function calls and do a lot of other stuff.

The program is supposed to be used like this:

$ voidnsrun glibc_program --arguments ...

where voidnsrun is the name of the program we're going to write, glibc_program is the name of our glibc program and --arguments ... are its arguments.

I'll omit all boring stuff like arguments processing logic, user input validation and so on, if you're interested in that, read the code of the final program. So let's get straight to the task.

First, we need to create new mount namespace, or in other words "unshare" the current namespace of the process and change to the private copy of this namespace. Use unshare(2) for that:

unshare(CLONE_NEWNS);

Then bind-mount your glibc container's (which, I'll assume for simplicity, resides in /glibc) /usr (this won't be visible outside the new namespace):

mount("/glibc/usr", "/usr", NULL, MS_BIND|MS_REC, NULL);

(Note that you should also bind-mount /etc and /var when you use xbps utilities such as xbps-install.)

As we don't need root anymore, drop it:

uid_t uid = getuid();
gid_t gid = getgid();
setreuid(uid, uid);
setregid(gid, gid);

And finally, launch our program:

execvp(argv[1], (char *const *)argv+1);

Surprisingly, it works. It launches your glibc program and, like Steve Jobs used to say, it just works.

But... try to launch some complex software this way, specifically software that launches another software. You'll notice that something's not right.

Let's take JetBrains PhpStorm as an example:

it has built-in terminal;
it can run your host's php;
it can open its help and documentation in a browser;
it can use fonts installed to /usr/share/fonts;
and so on...

You got it already? The PhpStorm exists in its own private mount namespace with substituted /usr, so every child process it creates will inherit this namespace. When you open built-in terminal, it will work, but actually it will be glibc-linked /glibc/usr/bin/bash from the bind-mounted /usr. All stuff you have installed in your root /usr will be missing because you haven't installed them in the glibc container. For example. you don't have a duplicate of firefox there? PhpStorm won't be able to open a browser.

What to do then? One "solution" would be to duplicate all programs you need in the glibc container, but it's a mess and not a real solution if you ask me.

Even weirder stuff happens in CLion. When you compile a program, it produces glibc binaries, because it runs glibc compiler, glibc linker, etc... You can launch and test the compiled program from built-in terminal in the IDE, but not outside it, because you would have to rebuild it with musl first. Oh well.

So I started thinking and eventually came up with an idea. What happens if you put together mount namespaces, bind mounts, forks and unix sockets?

Second attempt: writing voidnsundo

Let's focus on the PhpStorm example:

we want browser to work, and it must be the same browser we use on our musl system;
we want built-in terminal to run the original (or "native") shell from our musl system;
we don't want to duplicate all our fonts in /glibc.

We've learned how a process can create its own private mount namespace. All children of this process will inherit that namespace.

But there is also setns(2) system call that allows the calling process (actually, thread, but we're writing single-threaded programs here, so let's not complicate it) to change the namespace. It accepts a file descriptor and a namespace type.

So it seems we need to learn how to launch processes in the original (or "parent") namespace from within the glibc container's namespace. If PhpStorm could just launch /usr/bin/firefox in the right mount namespace, that would solve our problem.

Let's write a program which we'll call voidnsundo that will do the opposite to what voidnsrun does: it will change to the original namespace and launch specified command. voidnsundo is supposed to be used from within the glibc container, so it needs to be glibc-linked. You might need to install make and gcc to the glibc container:

$ sudo voidnsrun xbps-install make gcc

But to use the setns(2), we must have a file descriptor that refers to target namespace, in our case it's the namespace voidnsrun was in before calling unshare(2). We need to improve voidnsrun before we can write voidnsundo.

Improving voidnsrun

A process'es (is that right?) mount namespace file descriptor can be retrieved by open(2)-ing /proc/<pid>/ns/mnt pseudo-file (/proc/self/ns/mnt for the current process). So voidnsrun can get this file descriptor before unshare(2)-ing:

int nsfd = open("/proc/self/ns/mnt", O_RDONLY);

But how can another program access it in future?

The answer is unix sockets.

It's possible to send and receive file descriptors over unix sockets. I don't want to copypaste the whole fd-sending-receiving functions, you can see them here. For the article, I'll just declare them like so:

int send_fd(int sock, int fd);
int recv_fd(int sock);

So voidnsrun will get the fd, unshare, then it will create a tiny tmpfs filesystem that'll only be visible in the unshared namespace:

mount("tmpfs", "/run/voidnsrun", "tmpfs", 0, "size=4k,mode=0700,uid=0,gid=0");

The tmpfs at /run/voidnsrun will be used to store a socket file. A user may launch multiple voidnsrun instances, and this trick will make possible for each instance's socket to have the same /run/voidnsrun/sock path, but in each namespace it'll refer to a different socket.

Then it forks:

pid_t ppid_before_fork = getpid();
pid_t pid = fork();

In the child process, it will start a unix socket server. It'll just send nsfd to every client connecting to the socket. The server should stop and this child process should die when the parent dies, so take care of that too.

if (pid == 0) {
    /* Catch SIGTERM: it will be sent here when parent dies. The signal will
     * interrupt the accept() call, so we can clean up and exit immediately. */
    struct sigaction sa = {0};
    sa.sa_handler = onterm;
    sigaction(SIGTERM, &sa, NULL);

    /* Ignore SIGINT. Otherwise it will be affected by Ctrl+C in the parent
     * process. */
    signal(SIGINT, SIG_IGN);

    /* Set the child to get SIGTERM when parent thread dies. */
    prctl(PR_SET_PDEATHSIG, SIGTERM);

    /* Maybe it already has died? */
    if (getppid() != ppid_before_fork) {
        // exit logic here
    }

    /* Create unix socket. */
    int sock_fd = socket(AF_UNIX, SOCK_STREAM, 0);

    struct sockaddr_un sock_addr = {0};
    sock_addr.sun_family = AF_UNIX;

    strcpy(sock_addr.sun_path, "/run/voidnsrun/sock");

    bind(sock_fd, (struct sockaddr *)&sock_addr, sizeof(sock_addr));
    listen(sock_fd, 1);

    /* Accept incoming connections until SIGTERM. */
    while (!term_caught) {
        sock_conn = accept(sock_fd, NULL, 0);
        if (sock_conn == -1)
            continue;
        send_fd(sock_conn, nsfd);
    }
}

volatile sig_atomic_t term_caught = 0;
void onterm(int sig)
{
    term_caught = 1;
}

In the parent process, as before, drop root and launch program:

if (pid != 0) {
    uid_t uid = getuid();
    gid_t gid = getgid();
    setreuid(uid, uid);
    setregid(gid, gid);

    execvp(argv[1], (char *const *)argv+1);
}

Now we're ready to write the "undo" program.

Writing voidnsundo

Get file descriptor:

int sockfd, nsfd;

sockfd = socket(AF_UNIX, SOCK_STREAM, 0);

struct sockaddr_un sock_addr = {0};
sock_addr.sun_family  = AF_UNIX;
strcpy(sock_addr.sun_path, "/run/voidnsrun/sock");

connect(sockfd, (struct sockaddr *)&sock_addr, sizeof(sock_addr));
nsfd = recv_fd(sockfd);

Change mount namespace:

setns(nsfd, CLONE_NEWNS);

Drop root and launch program:

uid_t uid = getuid();
gid_t gid = getgid();
setreuid(uid, uid);
setregid(gid, gid);

execvp(argv[1], (char *const *)argv+1);

As you should remember voidnsundo is supposed to be used from within the glibc container, so it has to be built with glibc tools. You can use voidnsrun for that: just enter the glibc container with voidnsrun /bin/bash and compile it as usually.

It's time for some demo:

As you can see, we successfully enter private namespace with voidnsrun and then enter the original one with voidnsundo.

One more thing

Okay so now we can launch programs in the parent namespace from within glibc container. But how does it help us with PhpStorm being unable to launch a browser because there's no /usr/bin/firefox in the glibc container or built-in terminal in the IDE being not musl native? Bind-mounts, that's how!

In voidnsrun, after the unshare(2) call, we bind mount our voidnsundo binary to /usr/bin/bash and /usr/bin/firefox. (If there's no /usr/bin/firefox file then create an empty file there prior to mounting.) As you understand these bind mounts are only visible in the unshared namespace.

bool exists(const char *s);
bool mkfile(const char *s);

char *s = "/usr/bin/firefox";

if (!exists(s)) 
    mkfile(s);

mount("/usr/local/bin/voidnsundo", s, NULL, MS_BIND, NULL);

// same for /usr/bin/bash or other targets

In voidnsundo, add a check if it's been launched as voidnsundo:

char realpath_buf[PATH_MAX];
bool binded = strcmp(basename(argv[0]), "voidnsundo") != 0;
if (binded) {
    int bytes = readlink("/proc/self/exe", realpath_buf, PATH_MAX);
    realpath_buf[bytes] = '\0';
}

When executing a program, add a check too: if it's "binded", or in other words has not been launched as voidnsundo but as some other program, launch this program:

if (binded)
    argv[0] = realpath_buf;
execvp(argv[0], (char *const *)argv);

This trick will make voidnsundo, bind-mounted to /usr/bin/firefox, launch /usr/bin/firefox in the original namespace. Same with shell. Pure magic!

Preserving some /usr subdirectories

Okay, we've got this far and we've fixed browser and shell in PhpStorm. But what about fonts? We substitute the whole /usr when creating a mount namespace for glibc container to launch our app, and fonts are usually installed to /usr/share/fonts. Of course you could just duplicate them to the glibc container installation, even by using sudo voidnsrun xbps-install <font-package> for that, but that's no fun. We need the whole /usr to be the glibc /usr with an exception of share/fonts subdirectory. How can we achieve that?

It seems bind-mounts work here too and linux does not prohibit such weird things.

In voidnsrun, after unsharing but before /usr mounting, create tmpfs at /oldroot and bind-mount the real /usr to /oldroot/usr:

// it's convenient to use tmpfs here because it will be destroyed when the
// last process will exit the namespace, along with all files and mounts inside it

mount("tmpfs", "/oldroot", "tmpfs", 0, "size=4k,mode=0700,uid=0,gid=0");

struct stat st;
stat("/usr", &st);
mkdir("/oldroot/usr", st.st_mode);

mount("/usr", "/oldroot/usr", NULL, MS_BIND|MS_REC, NULL);

Then, after mounting /glibc/usr to /usr, bind-mount some directories back from oldroot:

mount("/oldroot/usr/share/fonts", "/usr/share/fonts", NULL, MS_BIND|MS_REC, NULL);

Ha-ha, that works!

The End

You can download the ready to use program here: https://github.com/gch1p/voidnsrun

It supports some runtime configuration and you can specify with arguments what you want to mount and where. I use it daily and it seems to be pretty stable. Of course there can be bugs, and if you find any, contact me.

If you have any comments, contact me by email.