Secure systemd containers the easy way
systemd-nspawn is great tool if you want to build cross distro packages, run some apps securely or just to install packages that are not available for your distribution
I’m gonna use archlinux but you can use any distribution with systemd. there is no need to install anything as systemd-nspawn is included with systemd package on archlinux.
Go get grab an Image.
I’ll Install OpenSuse Leap in systemd.
You can download images from official site which also includes kernel
and we don’t need it.
Those images are quite big for our use case. There is alternative option.
LXC project builds the images for many distribution which are much smaller in size.
here is ths link.
Search for OpenSuse and click on Image-opensuse.
from there press on the green balls and download the rootfs.tar.xz file.
you can directly tell systemd to download and configure the image for you with this easy command
❯ sudo machinectl pull-tar "https://jenkins.linuxcontainers.org/job/image-opensuse/architecture=amd64,release=15.1,variant=default/lastSuccessfulBuild/artifact/rootfs.tar.xz"
But download fails many times and when you retry to download it, it starts download all over again. So just you use browser to download and when it fails, just retry the download, firefox will automagically start the download from last position.
You can also use debootstrap to create Debian or Ubuntu image but it may take more time than downloading premade image dependeing on you hardware
❯ cd /var/lib/machine
❯ debootstrap --include=systemd-container \
--components=main,universe \
stable DebianStable http://deb.debian.org/debian/
The above command create Debian Stable installatin with name DebianStable
Import the image
Now import the tar file. you might wanna rename the rootfs.tar.xz to the machine of you liking. I renamed my rootfs.tar.xz to OpenSuseLeap.tar.xz
❯ sudo machinectl import-tar /home/smit/Downloads/OpenSuseLeap.tar.xz
Configure the image
Now to setup machine-id and password for the root, you’ll have to log into the machine without actually fully starting systemd-nspawn. its more like normal chroot
❯ sudo systemd-nspawn -M OpenSuseLeap
Now set password for root
❯ passwd root
and then set machine-id
❯ systemd-machine-id-setup
❯ ln -sf /etc/machine-id /var/lib/dbus/machine-id
also if you are going to use host’s networking, it might be good idea to disble wicked network manager
❯ systemctl disble wicked.service
You can also set the hostname
❯ echo GreenGecko > /etc/hostname
The /etc/machine-id file contains the unique machine ID of the local system that is set during installation or boot. Since we are using LXC images, we have to manually generate it. otherwise we’ll get the error stating that it didn’t find any valid machine-id
Now Start your container with
❯ sudo systemd-nspawn -bU -M OpenSuseLeap
With -b, systemd of OpenSuse will start as init and -U means start container as user namespace if OS supports it. At least Archlinux as of writing this support user namespaces by default.
just enter following command to check
❯ sysctl kernel.unprivileged_userns_clone
kernel.unprivileged_userns_clone = 1
if its set to 1, then it means its enabled, otherwise you can set it to 1 manually like this
❯ sudo sysctl kernel.unprivileged_userns_clone=1
Some people believe that enabling the user namespaces is not secure and they can introduce many security vulnerabilities.
On the other hand running containers in user namespace provides better protection if application running inside container manages to escape container as when it escapes, it wont have root privileges and wont be able to do any damage
infact, you cannot even choose to enable or disable user namespaces on vanilla linux kernel. they are enabled by default. debian introduced the patch to disable user namespace.
So I think keeping user namespace enabled is pretty secure.
In short, with user namespace, root inside you container will not be root outside of your container.
Be lazy
Now to get internet in container, the easiest way to use hosts connection. You can pass commandline arguments to set internet, bind volume ..etc each time but thats cumbersome and I am lazy.
So I will create this file:- /etc/systemd/nspawn/OpenSuseLeap.nspawn
[Network]
VirtualEthernet=no
[Exec]
PrivateUsers=pick
[Files]
BindReadOnly=/etc/resolv.conf
BindReadOnly=/tmp/.X11-unix
BindReadOnly=/tmp/container_xauth
Bind=/dev/dri
Bind=/dev/shm
Remember that filename and machine name must match
you can find detailed documentation at https://www.freedesktop.org/software/systemd/man/systemd-nspawn.html
In short
VirtualEthernet=no - don’t create new virtual interface, just use hosts networking
PrivateUsers=pick means use user namespace
since we’re gonna use hosts network, we also need /etc/resolv.conf file from our host
/tmp/.X11-unix is gonna need to run GUI apps with X11.
/dev/dri and /dev/shm is gonna need to run apps with hardware acceleration
Xauthority black magic
Since we are using user namespace, we cant just bind X11-unix, set DISPLAY=:0 and fire up GUI apps. We’ll need Xauthority file. We also just can’t bind it. We have to do some black magic. of course you can just do ssh x11 forwarding but TCP sockets are much slower compared to mighty Unix Domain Sockets.
Here’s what I do on host
❯ XAUTH=/tmp/container_xauth
❯ touch $XAUTH
The following line is dark magic from ArchWiki which basically creates Xauthority file for any user running on this system.
❯ xauth nextract - "$DISPLAY" | sed -e 's/^..../ffff/' | xauth -f "$XAUTH" nmerge -
Now you can just bind mount it and thats where BindReadOnly=/tmp/.X11-unix comes in
also give it correct permissions
❯ chmod 744 $XAUTH
You can also put the above commands in shell script
#!/bin/sh
XAUTH=/tmp/container_xauth
rm $XAUTH -f
touch $XAUTH
xauth nextract - "$DISPLAY" | sed -e 's/^..../ffff/' | xauth -f "$XAUTH" nmerge -
chmod 744 $XAUTH
All done !! Now just enter following command:-
❯ sudo machinectl start OpenSuseLeap
❯ sudo machinectl login OpenSuseLeap
and hooray!! you have your easy, secure lightweight container
Also don’t forget to stop container when you are done
❯ sudo machinectl stop OpenSuseLeap
For wayland, you need /run/user/1000/wayland-0 socket. you can just bind mount it if you are using privileged container (not using user namespace).
I haven’t found any simple solution yet to run Wayland application with user namespace. Only easy solution I know is to use ssh and waypipe
If you encounter any problems, do comment or let me know