Secure systemd containers the easy way

systemd-nspawn is great tool if you want to build cross distro packages, run some apps securely or just to install packages that are not available for your distribution

I’m gonna use archlinux but you can use any distribution with systemd. there is no need to install anything as systemd-nspawn is included with systemd package on archlinux.

Go get grab an Image.

I’ll Install OpenSuse Leap in systemd.
You can download images from official site which also includes kernel and we don’t need it. Those images are quite big for our use case. There is alternative option. LXC project builds the images for many distribution which are much smaller in size. here is ths link. Search for OpenSuse and click on Image-opensuse.

from there press on the green balls and download the rootfs.tar.xz file.

Download_opensuse

you can directly tell systemd to download and configure the image for you with this easy command

❯ sudo machinectl pull-tar "https://jenkins.linuxcontainers.org/job/image-opensuse/architecture=amd64,release=15.1,variant=default/lastSuccessfulBuild/artifact/rootfs.tar.xz"

But download fails many times and when you retry to download it, it starts download all over again. So just you use browser to download and when it fails, just retry the download, firefox will automagically start the download from last position.

You can also use debootstrap to create Debian or Ubuntu image but it may take more time than downloading premade image dependeing on you hardware

❯ cd /var/lib/machine
❯ debootstrap --include=systemd-container \
              --components=main,universe \ 
              stable DebianStable  http://deb.debian.org/debian/ 

The above command create Debian Stable installatin with name DebianStable

Import the image

Now import the tar file. you might wanna rename the rootfs.tar.xz to the machine of you liking. I renamed my rootfs.tar.xz to OpenSuseLeap.tar.xz

❯ sudo machinectl import-tar /home/smit/Downloads/OpenSuseLeap.tar.xz

Configure the image

Now to setup machine-id and password for the root, you’ll have to log into the machine without actually fully starting systemd-nspawn. its more like normal chroot

❯ sudo systemd-nspawn -M OpenSuseLeap

Now set password for root

❯ passwd root

and then set machine-id

❯ systemd-machine-id-setup 
❯ ln -sf /etc/machine-id /var/lib/dbus/machine-id

also if you are going to use host’s networking, it might be good idea to disble wicked network manager

❯ systemctl disble wicked.service

You can also set the hostname

❯ echo GreenGecko > /etc/hostname

The /etc/machine-id file contains the unique machine ID of the local system that is set during installation or boot. Since we are using LXC images, we have to manually generate it. otherwise we’ll get the error stating that it didn’t find any valid machine-id

Now Start your container with

❯ sudo systemd-nspawn -bU -M OpenSuseLeap

With -b, systemd of OpenSuse will start as init and -U means start container as user namespace if OS supports it. At least Archlinux as of writing this support user namespaces by default.

just enter following command to check

❯ sysctl kernel.unprivileged_userns_clone
kernel.unprivileged_userns_clone = 1

if its set to 1, then it means its enabled, otherwise you can set it to 1 manually like this

❯ sudo sysctl kernel.unprivileged_userns_clone=1

Some people believe that enabling the user namespaces is not secure and they can introduce many security vulnerabilities.

On the other hand running containers in user namespace provides better protection if application running inside container manages to escape container as when it escapes, it wont have root privileges and wont be able to do any damage

infact, you cannot even choose to enable or disable user namespaces on vanilla linux kernel. they are enabled by default. debian introduced the patch to disable user namespace.

So I think keeping user namespace enabled is pretty secure.

In short, with user namespace, root inside you container will not be root outside of your container.

Be lazy

Now to get internet in container, the easiest way to use hosts connection. You can pass commandline arguments to set internet, bind volume ..etc each time but thats cumbersome and I am lazy.

So I will create this file:- /etc/systemd/nspawn/OpenSuseLeap.nspawn

[Network]
VirtualEthernet=no

[Exec]
PrivateUsers=pick

[Files]
BindReadOnly=/etc/resolv.conf
BindReadOnly=/tmp/.X11-unix
BindReadOnly=/tmp/container_xauth
Bind=/dev/dri
Bind=/dev/shm

Remember that filename and machine name must match

you can find detailed documentation at https://www.freedesktop.org/software/systemd/man/systemd-nspawn.html

In short

VirtualEthernet=no - don’t create new virtual interface, just use hosts networking

PrivateUsers=pick means use user namespace

since we’re gonna use hosts network, we also need /etc/resolv.conf file from our host

/tmp/.X11-unix is gonna need to run GUI apps with X11.

/dev/dri and /dev/shm is gonna need to run apps with hardware acceleration

Xauthority black magic

Since we are using user namespace, we cant just bind X11-unix, set DISPLAY=:0 and fire up GUI apps. We’ll need Xauthority file. We also just can’t bind it. We have to do some black magic. of course you can just do ssh x11 forwarding but TCP sockets are much slower compared to mighty Unix Domain Sockets.

Here’s what I do on host

❯ XAUTH=/tmp/container_xauth
❯ touch $XAUTH

The following line is dark magic from ArchWiki which basically creates Xauthority file for any user running on this system.

❯ xauth nextract - "$DISPLAY" | sed -e 's/^..../ffff/' | xauth -f "$XAUTH" nmerge -

Now you can just bind mount it and thats where BindReadOnly=/tmp/.X11-unix comes in

also give it correct permissions

❯ chmod 744 $XAUTH

You can also put the above commands in shell script

#!/bin/sh

XAUTH=/tmp/container_xauth
rm $XAUTH -f
touch $XAUTH
xauth nextract - "$DISPLAY" | sed -e 's/^..../ffff/' | xauth -f "$XAUTH" nmerge -
chmod 744 $XAUTH

All done !! Now just enter following command:-

❯ sudo machinectl start OpenSuseLeap
❯ sudo machinectl login OpenSuseLeap

OpenSuseInNpawn

and hooray!! you have your easy, secure lightweight container

Also don’t forget to stop container when you are done

❯ sudo machinectl stop OpenSuseLeap

For wayland, you need /run/user/1000/wayland-0 socket. you can just bind mount it if you are using privileged container (not using user namespace).

I haven’t found any simple solution yet to run Wayland application with user namespace. Only easy solution I know is to use ssh and waypipe

If you encounter any problems, do comment or let me know