<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>JanMa's Blog</title>
    <description>My small part of the Web</description>
    <link>/</link>
    <atom:link href="/feed.xml" rel="self" type="application/rss+xml"/>
    <pubDate>Fri, 27 Sep 2024 19:02:27 +0000</pubDate>
    <lastBuildDate>Fri, 27 Sep 2024 19:02:27 +0000</lastBuildDate>
    <generator>Jekyll v4.2.0</generator>
    
      <item>
        <title>Deploy systemd units in your Nomad cluster</title>
        <description>&lt;p&gt;Back in 2014 before Kubernetes even existed, &lt;a href=&quot;https://coreos.com/&quot;&gt;CoreOs&lt;/a&gt;
included a simple cluster-scheduler called
&lt;a href=&quot;https://github.com/coreos/fleet&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Fleet&lt;/code&gt;&lt;/a&gt;. With Fleet you could aggregate your
individual machines into a pool of resources and deploy systemd unit files to
them. You could choose to either run your units globally on all machines at the
same time or limit them to a set of hosts. The idea behind it was to treat your
machines as if they would share an init system.&lt;/p&gt;

&lt;p&gt;In 2018 Fleet was removed from CoreOs in favor of Kubernetes and is since then
no longer maintained. Nevertheless the idea of being able to define a systemd
unit and deploy it on a set of machines still seems like it could be useful. So
after some tinkering I came up with a way to do exactly that using &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Nomad&lt;/code&gt; and
my &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;systemd-nspawn&lt;/code&gt; &lt;a href=&quot;https://github.com/JanMa/nomad-driver-nspawn&quot;&gt;driver&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;I am going to show you how to deploy a simple systemd unit for Consul running
inside a vanilla Debian image into your Nomad cluster.&lt;/p&gt;

&lt;h2 id=&quot;the-unit-file&quot;&gt;The unit file&lt;/h2&gt;

&lt;p&gt;Using the
&lt;a href=&quot;https://www.nomadproject.io/docs/job-specification/template&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;template&lt;/code&gt;&lt;/a&gt; stanza
inside a Nomad job file, we will render a systemd unit for Consul in the local
task directory.&lt;/p&gt;

&lt;div class=&quot;language-hcl highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;nx&quot;&gt;template&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;nx&quot;&gt;data&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;no&quot;&gt;EOH&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;
[Unit]
Description=&quot;HashiCorp Consul - A service mesh solution&quot;
Documentation=https://www.consul.io/
Requires=network-online.target
After=network-online.target

[Service]
ExecStart=[[ env &quot;NOMAD_TASK_DIR&quot; ]]/consul/consul agent -dev -bind '{{ GetInterfaceIP &quot;host0&quot; }}' -client '{{ GetInterfaceIP &quot;host0&quot; }}'
ExecReload=/bin/kill --signal HUP $MAINPID
KillMode=process
KillSignal=SIGTERM
Restart=on-failure
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target
&lt;/span&gt;&lt;span class=&quot;no&quot;&gt;EOH
&lt;/span&gt;  &lt;span class=&quot;nx&quot;&gt;destination&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;local/systemd/consul.service&quot;&lt;/span&gt;
  &lt;span class=&quot;nx&quot;&gt;left_delimiter&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;[[&quot;&lt;/span&gt;
  &lt;span class=&quot;nx&quot;&gt;right_delimiter&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;]]&quot;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The above generates a simple unit file which runs Consul in dev mode into the
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;local/systemd&lt;/code&gt; directory inside the started task. Consul is instructed to bind
it’s addresses to the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;host0&lt;/code&gt; interface which is available inside a
systemd-nspawn container running with private networking enabled. To not
interfere with the command line arguments of Consul, we tell Nomad to use &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;[[&lt;/code&gt;
and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;]]&lt;/code&gt; to delimit templating commands instead of the usual &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;{{&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;}}&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;To download the Consul binary, we make use of the
&lt;a href=&quot;https://www.nomadproject.io/docs/job-specification/artifact&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;artifact&lt;/code&gt;&lt;/a&gt;
stanza.&lt;/p&gt;

&lt;div class=&quot;language-hcl highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;nx&quot;&gt;artifact&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;nx&quot;&gt;source&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;https://releases.hashicorp.com/consul/1.9.0/consul_1.9.0_linux_amd64.zip&quot;&lt;/span&gt;
  &lt;span class=&quot;nx&quot;&gt;destination&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;local/consul&quot;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;With this unit file rendered, and the necessary binary downloaded, we need a way
to enable it on startup. Also we need to figure out how to make systemd load a
custom unit file from the local task directory instead of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/etc/systemd/system&lt;/code&gt;&lt;/p&gt;

&lt;h2 id=&quot;enabling-the-unit&quot;&gt;Enabling the unit&lt;/h2&gt;

&lt;p&gt;The usual way to enable a systemd unit is to run &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;systemctl enable &amp;lt;unit name&amp;gt;&lt;/code&gt;.
It will create a symbolic link inside the
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/etc/systemd/system/multi-user.target.wants&lt;/code&gt; pointing to your unit file.
Another way to enable a unit file without running a command, is to create a drop-in file for the
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;multi-user&lt;/code&gt; target. In this file you need to define a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Wants&lt;/code&gt; section which contains your
unit name. This ensures that your unit is started before the
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;multi-user.target&lt;/code&gt;, the same way as using &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;systemctl enable&lt;/code&gt; does.&lt;/p&gt;

&lt;p&gt;Using another &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;template&lt;/code&gt; stanza in our job file, we can use the second method to
enable the unit file we rendered to the local task directory.&lt;/p&gt;

&lt;div class=&quot;language-hcl highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;nx&quot;&gt;template&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;nx&quot;&gt;data&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;no&quot;&gt;EOH&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;
[Unit]
Wants=consul.service
&lt;/span&gt;&lt;span class=&quot;no&quot;&gt;EOH
&lt;/span&gt;  &lt;span class=&quot;nx&quot;&gt;destination&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;local/systemd/multi-user.target.d/wants.conf&quot;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;loading-systemd-units-from-a-different-path&quot;&gt;Loading systemd units from a different path&lt;/h2&gt;

&lt;p&gt;Systemd includes a nice little feature with allows you to specify additional
paths from where unit files are loaded on startup.  All you need to do, is to
set the environment variable &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SYSTEMD_UNIT_PATH&lt;/code&gt; to the directory containing
your files. If it  ends with a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;:&lt;/code&gt;, the usual load paths will be appended to the
content of the variable. This is similar to how you set the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;PATH&lt;/code&gt; variable
inside your shell.&lt;/p&gt;

&lt;p&gt;To set this, we simply need to make sure our systemd-container is started in
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;boot&lt;/code&gt; mode. Then all environment variables we pass to it, will be available to
systemd on startup. Using &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;boot&lt;/code&gt; mode is the default behavior of the
systemd-nspawn task driver so we only need to define the image we want to use
and the mentioned environment variable.&lt;/p&gt;

&lt;div class=&quot;language-hcl highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;nx&quot;&gt;config&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;nx&quot;&gt;image&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;consul&quot;&lt;/span&gt;
  &lt;span class=&quot;nx&quot;&gt;image_download&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;nx&quot;&gt;url&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;https://cloud.debian.org/images/cloud/buster/20201214-484/debian-10-generic-amd64-20201214-484.qcow2&quot;&lt;/span&gt;
    &lt;span class=&quot;nx&quot;&gt;force&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;kc&quot;&gt;true&lt;/span&gt;
    &lt;span class=&quot;nx&quot;&gt;type&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;raw&quot;&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
  &lt;span class=&quot;nx&quot;&gt;environment&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;nx&quot;&gt;SYSTEMD_UNIT_PATH&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;${NOMAD_TASK_DIR}/systemd:&quot;&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;the-complete-job&quot;&gt;The complete job&lt;/h2&gt;

&lt;p&gt;With all of the above in place, the complete job file now looks like this&lt;/p&gt;

&lt;div class=&quot;language-hcl highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;nx&quot;&gt;job&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;consul&quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;nx&quot;&gt;datacenters&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;dc1&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
  &lt;span class=&quot;nx&quot;&gt;type&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;service&quot;&lt;/span&gt;
  &lt;span class=&quot;nx&quot;&gt;group&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;linux&quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;nx&quot;&gt;count&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;

    &lt;span class=&quot;nx&quot;&gt;task&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;consul&quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
      &lt;span class=&quot;nx&quot;&gt;driver&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;nspawn&quot;&lt;/span&gt;
      &lt;span class=&quot;nx&quot;&gt;config&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;nx&quot;&gt;image&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;consul&quot;&lt;/span&gt;
        &lt;span class=&quot;nx&quot;&gt;image_download&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
          &lt;span class=&quot;nx&quot;&gt;url&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;https://cloud.debian.org/images/cloud/buster/20201214-484/debian-10-generic-amd64-20201214-484.qcow2&quot;&lt;/span&gt;
          &lt;span class=&quot;nx&quot;&gt;force&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;kc&quot;&gt;true&lt;/span&gt;
          &lt;span class=&quot;nx&quot;&gt;type&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;raw&quot;&lt;/span&gt;
        &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
        &lt;span class=&quot;nx&quot;&gt;environment&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
          &lt;span class=&quot;nx&quot;&gt;SYSTEMD_UNIT_PATH&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;${NOMAD_TASK_DIR}/systemd:&quot;&lt;/span&gt;
        &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
      &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;

      &lt;span class=&quot;nx&quot;&gt;artifact&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;nx&quot;&gt;source&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;https://releases.hashicorp.com/consul/1.9.0/consul_1.9.0_linux_amd64.zip&quot;&lt;/span&gt;
        &lt;span class=&quot;nx&quot;&gt;destination&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;local/consul&quot;&lt;/span&gt;
      &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;

      &lt;span class=&quot;nx&quot;&gt;template&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;nx&quot;&gt;data&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;no&quot;&gt;EOH&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;
[Unit]
Description=&quot;HashiCorp Consul - A service mesh solution&quot;
Documentation=https://www.consul.io/
Requires=network-online.target
After=network-online.target

[Service]
ExecStart=[[ env &quot;NOMAD_TASK_DIR&quot; ]]/consul/consul agent -dev -bind '' -client ''
ExecReload=/bin/kill --signal HUP $MAINPID
KillMode=process
KillSignal=SIGTERM
Restart=on-failure
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target
&lt;/span&gt;&lt;span class=&quot;no&quot;&gt;EOH
&lt;/span&gt;        &lt;span class=&quot;nx&quot;&gt;destination&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;local/systemd/consul.service&quot;&lt;/span&gt;
        &lt;span class=&quot;nx&quot;&gt;left_delimiter&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;[[&quot;&lt;/span&gt;
        &lt;span class=&quot;nx&quot;&gt;right_delimiter&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;]]&quot;&lt;/span&gt;
      &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;

      &lt;span class=&quot;nx&quot;&gt;template&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;nx&quot;&gt;data&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;no&quot;&gt;EOH&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;
[Unit]
Wants=systemd-networkd.service systemd-resolved.service consul.service
&lt;/span&gt;&lt;span class=&quot;no&quot;&gt;EOH
&lt;/span&gt;        &lt;span class=&quot;nx&quot;&gt;destination&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;local/systemd/multi-user.target.d/wants.conf&quot;&lt;/span&gt;
      &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;If you deploy the job inside your Nomad cluster and spawn a shell inside the task,
you can see that the unit file defined in the job file is properly loaded on startup.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;➜  ~ nomad exec -job consul /bin/bash
root@buster:/# systemctl status consul
● consul.service - &quot;HashiCorp Consul - A service mesh solution&quot;
   Loaded: loaded (/local/systemd/consul.service; disabled; vendor preset: enabled)
   Active: active (running) since Sun 2021-01-17 19:20:59 CET; 2min 15s ago
     Docs: https://www.consul.io/
 Main PID: 36 (consul)
   CGroup: /system.slice/consul.service
           └─36 /local/consul/consul agent -dev -bind  -client 

Jan 17 19:20:59 buster consul[36]:     2021-01-17T19:20:59.750+0100 [INFO]  agent.server: member joined, marking health alive: member=buster
Jan 17 19:20:59 buster consul[36]:     2021-01-17T19:20:59.791+0100 [INFO]  agent.server: federation state anti-entropy synced
Jan 17 19:20:59 buster consul[36]:     2021-01-17T19:20:59.826+0100 [DEBUG] agent: Skipping remote check since it is managed automatically: check=serfHealth
Jan 17 19:20:59 buster consul[36]:     2021-01-17T19:20:59.826+0100 [INFO]  agent: Synced node info
Jan 17 19:21:01 buster consul[36]:     2021-01-17T19:21:01.771+0100 [DEBUG] agent: Skipping remote check since it is managed automatically: check=serfHealth
Jan 17 19:21:01 buster consul[36]:     2021-01-17T19:21:01.771+0100 [DEBUG] agent: Node info in sync
Jan 17 19:22:53 buster consul[36]:     2021-01-17T19:22:53.841+0100 [DEBUG] agent: Skipping remote check since it is managed automatically: check=serfHealth
Jan 17 19:22:53 buster consul[36]:     2021-01-17T19:22:53.841+0100 [DEBUG] agent: Node info in sync
Jan 17 19:22:59 buster consul[36]:     2021-01-17T19:22:59.699+0100 [DEBUG] agent.router.manager: Rebalanced servers, new active server: number_of_servers=1 active_server=&quot;buster.dc1 (Addr: tcp/192.168.74.222:8300) (DC: dc1)&quot;
Jan 17 19:22:59 buster consul[36]:     2021-01-17T19:22:59.700+0100 [DEBUG] agent.router.manager: Rebalanced servers, new active server: number_of_servers=1 active_server=&quot;buster (Addr: tcp/192.168.74.222:8300) (DC: dc1)&quot;
root@buster:/# 

&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;And that’s all there is to it :-). I hope you find this as useful as I do&lt;/p&gt;

&lt;p&gt;Jan&lt;/p&gt;
</description>
        <pubDate>Sun, 10 Jan 2021 00:00:00 +0000</pubDate>
        <link>/2021-01-10/nomad-deploy-systemd-units/</link>
        <guid isPermaLink="true">/2021-01-10/nomad-deploy-systemd-units/</guid>
        
        <category>linux</category>
        
        <category>containers</category>
        
        <category>systemd</category>
        
        <category>nspawn</category>
        
        <category>nomad</category>
        
        <category>coreos</category>
        
        <category>fleet</category>
        
        
      </item>
    
      <item>
        <title>Orchestrating containers with Nomad and systemd-nspawn</title>
        <description>&lt;p&gt;In my &lt;a href=&quot;/2019-10-13/systemd-nspawn/&quot;&gt;last post&lt;/a&gt; I
took a deeper look into &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;systemd-nspawn&lt;/code&gt; and how to use it to run containers.
Afterwards I decided to figure out the logical next step of how to orchestrate
those containers. This is what this post is all about :-). I will show you how to
use &lt;a href=&quot;https://nomadproject.io&quot;&gt;HashiCorp Nomad&lt;/a&gt; together with a custom plugin I
wrote for it to orchestrate &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;systemd-nspawn&lt;/code&gt; containers across multiple hosts.&lt;/p&gt;

&lt;h2 id=&quot;about-nomad&quot;&gt;About Nomad&lt;/h2&gt;

&lt;p&gt;If you have never heard of Nomad or have never used it before, I can recommend
you to read the &lt;a href=&quot;https://nomadproject.io/intro/&quot;&gt;Introduction to Nomad&lt;/a&gt; guide.
To quote the docs:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Nomad is a flexible workload orchestrator that enables an organization to
easily deploy and manage any containerized or legacy application using a
single, unified workflow. Nomad can run a diverse workload of Docker,
non-containerized, microservice, and batch applications.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In version &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;0.9&lt;/code&gt; Nomad introduced a plugin framework which allows you to extend
it’s functionality to add new Task Drivers and Device plugins. Adding a new Task
Driver allows you to execute workloads which are not manageable by the included
ones like Docker or Java. You can find a list of already community supported
Task Drivers in the &lt;a href=&quot;https://nomadproject.io/docs/drivers/external/&quot;&gt;Nomad
Docs&lt;/a&gt;. Those include for
example:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;LXC&lt;/li&gt;
  &lt;li&gt;Podman&lt;/li&gt;
  &lt;li&gt;Firecracker&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In the last few months I got acquainted with the new plugin framework and
wrote a custom Task Driver for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;systemd-nspawn&lt;/code&gt;. I recently released version
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;0.1.0&lt;/code&gt; and it is now in a state where I feel it is ready to be shared with the
world :-). You can find the code at
&lt;a href=&quot;https://github.com/JanMa/nomad-driver-nspawn&quot;&gt;GitHub&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;For the rest of this post, I am going to assume that you have read the guide I
linked above or are otherwise already acquainted with Nomad and it’s
terminology. Also you should have Nomad installed somewhere in your &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;PATH&lt;/code&gt;.&lt;/p&gt;

&lt;h2 id=&quot;using-the-nspawn-plugin&quot;&gt;Using the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;nspawn&lt;/code&gt; plugin&lt;/h2&gt;

&lt;p&gt;To build the Task Driver for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;systemd-nspawn&lt;/code&gt;, you need a recent version of Go
installed. Checkout the repository from GitHub then then simply run &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;go build&lt;/code&gt;.&lt;/p&gt;

&lt;div class=&quot;language-shell highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;git clone https://github.com/JanMa/nomad-driver-nspawn.git
&lt;span class=&quot;nb&quot;&gt;cd &lt;/span&gt;nomad-driver-nspawn
go build &lt;span class=&quot;nt&quot;&gt;-mod&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;vendor
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This will produce a binary called &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;nomad-driver-nspawn&lt;/code&gt; which can be used as a
plugin by Nomad. The easiest way to test it, is to use Nomad in development
mode. This starts a single node Nomad cluster on your machine, with the local
agent acting as client and server. From the git root run&lt;/p&gt;

&lt;div class=&quot;language-shell highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;nb&quot;&gt;sudo &lt;/span&gt;nomad agent &lt;span class=&quot;nt&quot;&gt;-dev&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-plugin-dir&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;$(&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;pwd&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-config&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;example/config.hcl
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;and visit &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;http://127.0.0.1:4646&lt;/code&gt; to see the Nomad web UI. If you click
on the Clients tab, you should see your local machine and the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;nspawn&lt;/code&gt; driver
showing up as healthy in the client details.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/nomad_driver_status.png&quot; alt=&quot;Nomad Driver Status&quot; /&gt;&lt;/p&gt;

&lt;p&gt;This means you are ready to start your first task.&lt;/p&gt;

&lt;h2 id=&quot;two-simple-example-jobs&quot;&gt;Two simple example jobs&lt;/h2&gt;

&lt;p&gt;The driver repo contains two examples which show very basic configurations. The
first one is located in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;example/Debian&lt;/code&gt; folder. There you will find a very
simple &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;debian.hcl&lt;/code&gt; file which will start a plain &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Debian/Buster&lt;/code&gt; image that
does exactly nothing (except running systemd).&lt;/p&gt;

&lt;div class=&quot;language-hcl highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;nx&quot;&gt;job&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;debian&quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;nx&quot;&gt;datacenters&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;dc1&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
  &lt;span class=&quot;nx&quot;&gt;type&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;service&quot;&lt;/span&gt;
  &lt;span class=&quot;nx&quot;&gt;group&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;linux&quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;nx&quot;&gt;count&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;
    &lt;span class=&quot;nx&quot;&gt;task&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;debian&quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
      &lt;span class=&quot;nx&quot;&gt;driver&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;nspawn&quot;&lt;/span&gt;
      &lt;span class=&quot;nx&quot;&gt;config&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;nx&quot;&gt;image&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;example/Debian/image&quot;&lt;/span&gt;
        &lt;span class=&quot;nx&quot;&gt;resolv_conf&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;copy-host&quot;&lt;/span&gt;
      &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;It uses the image located in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;example/Debian/image&lt;/code&gt; folder inside the git
repo and copies the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/etc/resolv.conf&lt;/code&gt; file of your host into the container to
enable DNS resolution.&lt;/p&gt;

&lt;p&gt;Before you are able to run the job file, you need to build the container image
first. If you haven’t already, install
&lt;a href=&quot;https://github.com/systemd/mkosi&quot;&gt;mkosi&lt;/a&gt; on your machine, open another shell
and run the following command inside the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;example/Debian&lt;/code&gt; folder.&lt;/p&gt;

&lt;div class=&quot;language-shell highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;nb&quot;&gt;sudo &lt;/span&gt;mkosi
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Mkosi will parse the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;mkosi.*&lt;/code&gt; files in the folder and produce an &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;image&lt;/code&gt; sub-folder
containing a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Debian/Buster&lt;/code&gt; file tree. Now you can start the Task by executing:&lt;/p&gt;

&lt;div class=&quot;language-shell highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;nomad run debian.hcl
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;If you take a look at the output of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;nomad status&lt;/code&gt;, you should see a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;debian&lt;/code&gt; job with
the status &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;running&lt;/code&gt;. The container should also show up if you call &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;machinectl
list&lt;/code&gt;.&lt;/p&gt;

&lt;div class=&quot;language-shell highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;nv&quot;&gt;$ &lt;/span&gt;nomad status
ID      Type     Priority  Status   Submit Date
debian  service  50        running  2020-03-08T09:27:20+01:00
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;language-shell highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;nv&quot;&gt;$ &lt;/span&gt;machinectl list
MACHINE                                     CLASS     SERVICE        OS     VERSION ADDRESSES
debian-8eeb9876-1195-413c-4433-55dcd779f586 container systemd-nspawn debian 10      192.168.60.38…

1 machines listed.
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;After the job has started you can attach a shell to the container or run any
command you want in it. Have a look at the output of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;nomad status debian&lt;/code&gt; and
copy the allocation ID from the last line of output. Then run&lt;/p&gt;
&lt;div class=&quot;language-shell highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;nomad alloc &lt;span class=&quot;nb&quot;&gt;exec&lt;/span&gt; &amp;lt;alloc ID&amp;gt; /bin/bash
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;to start a shell inside the container. You could also do the same thing by
running &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;machinectl shell &amp;lt;machine-name&amp;gt;&lt;/code&gt;. Note that this only works if the
container was started with the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;boot&lt;/code&gt; option set to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;true&lt;/code&gt; (which it is by
default).&lt;/p&gt;

&lt;p&gt;Since this example is somewhat useless, let’s move on to the next one. In the
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;example/Nginx&lt;/code&gt; folder you will find a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;nginx.hcl&lt;/code&gt; file which is a little more
useful than the previous one.&lt;/p&gt;

&lt;div class=&quot;language-hcl highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;nx&quot;&gt;job&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;nginx&quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;nx&quot;&gt;datacenters&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;dc1&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
  &lt;span class=&quot;nx&quot;&gt;type&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;service&quot;&lt;/span&gt;
  &lt;span class=&quot;nx&quot;&gt;group&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;linux&quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;nx&quot;&gt;count&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;
    &lt;span class=&quot;nx&quot;&gt;task&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;nginx&quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
      &lt;span class=&quot;nx&quot;&gt;driver&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;nspawn&quot;&lt;/span&gt;
      &lt;span class=&quot;nx&quot;&gt;config&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;nx&quot;&gt;image&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;example/Nginx/image&quot;&lt;/span&gt;
        &lt;span class=&quot;nx&quot;&gt;resolv_conf&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;copy-host&quot;&lt;/span&gt;
        &lt;span class=&quot;nx&quot;&gt;command&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;
          &lt;span class=&quot;s2&quot;&gt;&quot;/bin/bash&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; 
          &lt;span class=&quot;s2&quot;&gt;&quot;-c&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; 
          &lt;span class=&quot;s2&quot;&gt;&quot;dhclient &amp;amp;&amp;amp; nginx &amp;amp;&amp;amp; tail -f /var/log/nginx/access.log&quot;&lt;/span&gt;
          &lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
        &lt;span class=&quot;nx&quot;&gt;boot&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;kc&quot;&gt;false&lt;/span&gt;
        &lt;span class=&quot;nx&quot;&gt;process_two&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;kc&quot;&gt;true&lt;/span&gt;
        &lt;span class=&quot;nx&quot;&gt;port_map&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
          &lt;span class=&quot;nx&quot;&gt;http&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;80&lt;/span&gt;
        &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
      &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
      &lt;span class=&quot;nx&quot;&gt;resources&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;nx&quot;&gt;network&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
          &lt;span class=&quot;nx&quot;&gt;port&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;http&quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
            &lt;span class=&quot;nx&quot;&gt;static&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;8080&quot;&lt;/span&gt;
          &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
        &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
      &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The job file uses the image located in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;example/Nginx/image&lt;/code&gt; and copies the
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/etc/resolv.conf&lt;/code&gt; as before. It also sets &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;boot&lt;/code&gt; to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;false&lt;/code&gt; and in turn
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;process_two&lt;/code&gt; to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;true&lt;/code&gt;, which causes the Bash script configured in the
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;command&lt;/code&gt; stanza to be run as process two inside the container. The container’s
port &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;80&lt;/code&gt; will be forwarded to port &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;8080&lt;/code&gt; on your host.&lt;/p&gt;

&lt;p&gt;To start the job, build the image the same way as before by executing &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sudo
mkosi&lt;/code&gt; and then run:&lt;/p&gt;

&lt;div class=&quot;language-shell highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;nomad run nginx.hcl
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;When calling &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;nomad status&lt;/code&gt;, you will now see an additional &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;nginx&lt;/code&gt; job running.
You will also see another machine in the output of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;machinectl list&lt;/code&gt;. If you’d
try to start a shell inside the container now, Nomad will exit with an error
since this container was not started with the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;boot&lt;/code&gt; parameter set to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;true&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Because the job exposes port &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;80&lt;/code&gt; of the container to your hosts’ port &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;8080&lt;/code&gt;
you can &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;curl&lt;/code&gt; it to see if Nginx is actually running. One thing to note here is
that &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;systemd-nspawn&lt;/code&gt; does not forward exposed ports to your loopback interface.
So &lt;strong&gt;accessing&lt;/strong&gt; &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;127.0.0.1:8080&lt;/code&gt; &lt;strong&gt;will not work&lt;/strong&gt;.&lt;/p&gt;

&lt;div class=&quot;language-shell highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;curl http://&amp;lt;your-ip&amp;gt;:8080 &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; /dev/null
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   612  100   612    0     0   149k      0 &lt;span class=&quot;nt&quot;&gt;--&lt;/span&gt;:--:-- &lt;span class=&quot;nt&quot;&gt;--&lt;/span&gt;:--:-- &lt;span class=&quot;nt&quot;&gt;--&lt;/span&gt;:--:--  199k
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Since the last command in the containers Bash script is following the Nginx log
file inside the container, you are able to see it when accessing the containers
logs via Nomad.&lt;/p&gt;

&lt;div class=&quot;language-shell highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;nomad alloc logs &amp;lt;alloc ID&amp;gt;
192.168.1.226 - - &lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;04/Mar/2020:22:13:48 +0100] &lt;span class=&quot;s2&quot;&gt;&quot;GET / HTTP/1.1&quot;&lt;/span&gt; 200 612 &lt;span class=&quot;s2&quot;&gt;&quot;-&quot;&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;curl/7.68.0&quot;&lt;/span&gt;
192.168.1.226 - - &lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;06/Mar/2020:21:41:56 +0100] &lt;span class=&quot;s2&quot;&gt;&quot;GET / HTTP/1.1&quot;&lt;/span&gt; 200 612 &lt;span class=&quot;s2&quot;&gt;&quot;-&quot;&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;curl/7.68.0&quot;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Those examples are of course very simple and not suited for any real world
workloads. Make sure to have a look at the drivers
&lt;a href=&quot;https://github.com/JanMa/nomad-driver-nspawn/blob/master/README.md&quot;&gt;README&lt;/a&gt;
page for all options it currently supports. The naming is kept close to the
names of the underlying &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;systemd-nspawn&lt;/code&gt; arguments.&lt;/p&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;In it’s current state the driver should allow you to run reasonably complex
workloads with it, but it is by no means finished. For example I am trying to
figure out a way to allow executing commands via &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;nomad alloc exec&lt;/code&gt; in containers
which are not started with the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;boot&lt;/code&gt; parameter. I also want to add support for
network modes which allow you to start multiple containers in the same isolated
network namespace.&lt;/p&gt;

&lt;p&gt;Also there’s the problem of image distribution which needs to be solved somehow.
Manually building container images on your hosts before being able to start
tasks on them isn’t really convenient. You can currently circumvent it a bit by
using the &lt;a href=&quot;https://nomadproject.io/docs/job-specification/artifact/&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;artifact&lt;/code&gt;&lt;/a&gt;
stanza in your job file to download an image from some server before starting a
task though. Make sure to have a look at &lt;a href=&quot;https://nspawn.org/&quot;&gt;nspawn.org&lt;/a&gt; for
some nice pre-built container images you could use.&lt;/p&gt;

&lt;p&gt;If you find issues or a bug in the driver or have general questions about it,
please open an &lt;a href=&quot;https://github.com/JanMa/nomad-driver-nspawn/issues&quot;&gt;Issue&lt;/a&gt;
at GitHub and I will try to help you as good as I can :-)&lt;/p&gt;

&lt;p&gt;Jan&lt;/p&gt;
</description>
        <pubDate>Sun, 08 Mar 2020 00:00:00 +0000</pubDate>
        <link>/2020-03-08/nomad-nspawn/</link>
        <guid isPermaLink="true">/2020-03-08/nomad-nspawn/</guid>
        
        <category>linux</category>
        
        <category>containers</category>
        
        <category>systemd</category>
        
        <category>nspawn</category>
        
        <category>nomad</category>
        
        
      </item>
    
      <item>
        <title>Running containers with systemd-nspawn</title>
        <description>&lt;p&gt;I recently discovered that apart from &lt;a href=&quot;https://www.freedesktop.org/software/systemd/man/systemd.service.html#&quot;&gt;running
services&lt;/a&gt;,
&lt;a href=&quot;https://www.freedesktop.org/software/systemd/man/systemd.timer.html#&quot;&gt;scheduling
timers&lt;/a&gt;,
&lt;a href=&quot;https://www.freedesktop.org/software/systemd/man/systemd-networkd.html#&quot;&gt;configuring your network
interfaces&lt;/a&gt;,
&lt;a href=&quot;https://www.freedesktop.org/software/systemd/man/systemd-resolved.html#&quot;&gt;resolving
names&lt;/a&gt;
and &lt;em&gt;a lot more&lt;/em&gt; you can also run containers with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;systemd&lt;/code&gt; using
&lt;a href=&quot;https://www.freedesktop.org/software/systemd/man/systemd-nspawn.html#&quot;&gt;systemd-nspawn&lt;/a&gt;.
This was completely new to me so I decided to take a deeper look into the
necessary steps to get this up and running. First I looked into how to build
images suitable for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;systemd-nspawn&lt;/code&gt; and then at the different ways to run and
manage containers with the help of builtin tools. After reading this post you
will hopefully also have a rough understanding about how this works and are able
to run simple workloads in containers yourself using only &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;systemd&lt;/code&gt;.&lt;/p&gt;

&lt;h1 id=&quot;prerequisites&quot;&gt;Prerequisites&lt;/h1&gt;

&lt;p&gt;To use &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;systemd-nspawn&lt;/code&gt; you can install it on Debian based distributions via:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;apt install systemd-container
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;On Arch based distributions it comes already pre-packaged with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;systemd&lt;/code&gt;. You also need
to enable and start &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;systemd-networkd.service&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;systemd-resolved.service&lt;/code&gt; so
networking and name resolution work inside the containers.&lt;/p&gt;

&lt;h1 id=&quot;building-an-image&quot;&gt;Building an image&lt;/h1&gt;

&lt;p&gt;There are several ways in which you can build an image for use with
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;systemd-nspawn&lt;/code&gt; depending on the distribution you want to use. In this post I
am going to use &lt;a href=&quot;https://debian.org&quot;&gt;Debian&lt;/a&gt; but you can of course use any
distribution you like. You just have to make sure it contains a valid
&lt;a href=&quot;https://www.freedesktop.org/software/systemd/man/os-release.html&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/etc/os-release&lt;/code&gt;&lt;/a&gt;
file. It is also helpful if the distribution it uses &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;systemd&lt;/code&gt; as the init
system as well, but not necessary. &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;systemd-nspwan&lt;/code&gt; will also run any other init
system it finds inside the filesystem tree.&lt;/p&gt;

&lt;p&gt;The default way to build images for Debian is to use &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;debootstrap&lt;/code&gt;. Creating a
minimal image based on the latest stable version Buster can be done by
executing:&lt;/p&gt;

&lt;div class=&quot;language-shell highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;debootstrap &lt;span class=&quot;nt&quot;&gt;--include&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;systemd-container stable /var/lib/machines/Buster
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This creates a filesystem tree inside the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/var/lib/machines/Buster&lt;/code&gt; directory which you
can use with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;systemd-nspawn&lt;/code&gt;. To make everything work completely, you have to
perform some post installation steps.&lt;/p&gt;

&lt;div class=&quot;language-shell highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c&quot;&gt;# systemd-nspawn -D /var/lib/machines/Buster&lt;/span&gt;
Spawning container Buster on /var/lib/machines/Buster.
Press ^] three &lt;span class=&quot;nb&quot;&gt;times &lt;/span&gt;within 1s to &lt;span class=&quot;nb&quot;&gt;kill &lt;/span&gt;container.
root@Buster:~# systemctl &lt;span class=&quot;nb&quot;&gt;enable &lt;/span&gt;systemd-networkd
root@Buster:~# systemctl &lt;span class=&quot;nb&quot;&gt;enable &lt;/span&gt;systemd-resolved
root@Buster:~# &lt;span class=&quot;nb&quot;&gt;ln&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-sf&lt;/span&gt; /run/systemd/resolve/stub-resolv.conf /etc/resolv.conf
root@Buster:~# &lt;span class=&quot;nb&quot;&gt;ln&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-sf&lt;/span&gt; /run/systemd/resolve/stub-resolv.conf /etc/resolv.conf
root@Buster:~# &lt;span class=&quot;nb&quot;&gt;mkdir&lt;/span&gt; /etc/systemd/resolved.conf.d
root@Buster:~# &lt;span class=&quot;nb&quot;&gt;echo&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;[Resolve]&quot;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; /etc/systemd/resolved.conf.d/dns.conf
root@Buster:~# &lt;span class=&quot;nb&quot;&gt;echo&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;DNS=1.1.1.1 8.8.8.8&quot;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&lt;/span&gt; /etc/systemd/resolved.conf.d/dns.conf
root@Buster:~# &lt;span class=&quot;nb&quot;&gt;echo&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;pts/0&quot;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&lt;/span&gt; /etc/securetty
root@Buster:~# &lt;span class=&quot;nb&quot;&gt;exit
logout
&lt;/span&gt;Container Buster exited successfully.
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Let’s go through this step by step. I first spawned a shell inside the newly
created image. Then I needed to enable &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;systemd-networkd&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;systemd-resolved&lt;/code&gt;
in order to get networking and name resolution inside the container working
properly. For this I also linked the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;stub-resolv.conf&lt;/code&gt; generated by
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;systemd-resolved&lt;/code&gt; to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/etc/resolv.conf&lt;/code&gt; and configured the DNS servers which
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;systmed-resolved&lt;/code&gt; will use. Otherwise the running container cannot resolve anything. The DNS
servers are configured inside &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/etc/systemd/resolved.conf.d/dns.conf&lt;/code&gt;. As a last
step, I added &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;pts/0&lt;/code&gt; to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/etc/securetty&lt;/code&gt; to enable root logins.&lt;/p&gt;

&lt;p&gt;To make this process more automated, you can also use &lt;a href=&quot;https://github.com/systemd/mkosi&quot;&gt;mkosi&lt;/a&gt;. It is a wrapper
around &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;debootstrap&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;pacstrap&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;zypper&lt;/code&gt; to create minimal, legacy free OS images.
To install &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;mkosi&lt;/code&gt;, clone the &lt;a href=&quot;https://github.com/systemd/mkosi&quot;&gt;git repo&lt;/a&gt; somewhere,
open a shell in it and simply run the install script.&lt;/p&gt;

&lt;div class=&quot;language-shell highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;git clone https://github.com/systemd/mkosi.git
&lt;span class=&quot;nb&quot;&gt;cd &lt;/span&gt;mkosi
&lt;span class=&quot;nb&quot;&gt;sudo &lt;/span&gt;setup.py &lt;span class=&quot;nb&quot;&gt;install&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;For detailed information about how to use it, see the &lt;a href=&quot;https://github.com/systemd/mkosi/blob/master/mkosi.md&quot;&gt;man page&lt;/a&gt;. Now create an
empty directory somewhere and create the following files in it:&lt;/p&gt;

&lt;div class=&quot;language-shell highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c&quot;&gt;# tree&lt;/span&gt;
&lt;span class=&quot;nb&quot;&gt;.&lt;/span&gt;
├── mkosi.default
└── mkosi.postinst
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;language-conf highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c&quot;&gt;# cat mkosi.default
&lt;/span&gt;
[&lt;span class=&quot;n&quot;&gt;Distribution&lt;/span&gt;]
&lt;span class=&quot;n&quot;&gt;Distribution&lt;/span&gt;=&lt;span class=&quot;n&quot;&gt;debian&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;Release&lt;/span&gt;=&lt;span class=&quot;n&quot;&gt;buster&lt;/span&gt;

[&lt;span class=&quot;n&quot;&gt;Output&lt;/span&gt;]
&lt;span class=&quot;n&quot;&gt;Format&lt;/span&gt;=&lt;span class=&quot;n&quot;&gt;directory&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;Bootable&lt;/span&gt;=&lt;span class=&quot;n&quot;&gt;no&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;Hostname&lt;/span&gt;=&lt;span class=&quot;n&quot;&gt;buster&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;Output&lt;/span&gt;=/&lt;span class=&quot;n&quot;&gt;var&lt;/span&gt;/&lt;span class=&quot;n&quot;&gt;lib&lt;/span&gt;/&lt;span class=&quot;n&quot;&gt;machines&lt;/span&gt;/&lt;span class=&quot;n&quot;&gt;buster&lt;/span&gt;

[&lt;span class=&quot;n&quot;&gt;Validation&lt;/span&gt;]
&lt;span class=&quot;n&quot;&gt;Password&lt;/span&gt;=&lt;span class=&quot;n&quot;&gt;root&lt;/span&gt;

[&lt;span class=&quot;n&quot;&gt;Packages&lt;/span&gt;]
&lt;span class=&quot;n&quot;&gt;Packages&lt;/span&gt;=
	&lt;span class=&quot;n&quot;&gt;iputils&lt;/span&gt;-&lt;span class=&quot;n&quot;&gt;ping&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;systemd&lt;/span&gt;-&lt;span class=&quot;n&quot;&gt;container&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;iproute2&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;language-shell highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c&quot;&gt;# cat mkosi.postinst&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;#!/bin/sh&lt;/span&gt;
&lt;span class=&quot;c&quot;&gt;# make sure systemd-networkd and systemd-resolved are running&lt;/span&gt;
systemctl &lt;span class=&quot;nb&quot;&gt;enable &lt;/span&gt;systemd-networkd
systemctl &lt;span class=&quot;nb&quot;&gt;enable &lt;/span&gt;systemd-resolved
&lt;span class=&quot;c&quot;&gt;# make sure we symlink /run/systemd/resolve/stub-resolv.conf to /etc/resolv.conf&lt;/span&gt;
&lt;span class=&quot;c&quot;&gt;# otherwise curl will fail&lt;/span&gt;
&lt;span class=&quot;nb&quot;&gt;ln&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-sf&lt;/span&gt; /run/systemd/resolve/stub-resolv.conf /etc/resolv.conf
&lt;span class=&quot;c&quot;&gt;# Configure global DNS servers&lt;/span&gt;
&lt;span class=&quot;nb&quot;&gt;mkdir&lt;/span&gt; /etc/systemd/resolved.conf.d
&lt;span class=&quot;nb&quot;&gt;echo&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;[Resolve]&quot;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; /etc/systemd/resolved.conf.d/dns.conf
&lt;span class=&quot;nb&quot;&gt;echo&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;DNS=1.1.1.1 8.8.8.8&quot;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&lt;/span&gt; /etc/systemd/resolved.conf.d/dns.conf
&lt;span class=&quot;c&quot;&gt;# set pts/0 in /etc/securetty to enable root login&lt;/span&gt;
&lt;span class=&quot;nb&quot;&gt;echo&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;pts/0&quot;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&lt;/span&gt; /etc/securetty
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;mkosi&lt;/code&gt; will read the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;mkosi.default&lt;/code&gt; file for the settings of the image.
According to the file, it will create a directory at &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/var/lib/machines/buster&lt;/code&gt;
containing a Debian/Buster filesystem tree, make it not bootable inside a virtual machine
and set the host name and root password . It will also install some additional
packages. There are actually a lot more options you could use but I will leave
it rather simple for this post. After the image was created, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;mkosi&lt;/code&gt; will run
the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;mkosi.postinst&lt;/code&gt; script inside the image which performs all of the steps
just done by hand when using &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;debootstrap&lt;/code&gt;. Make sure to set the executable flag
on the file after creating it.&lt;/p&gt;

&lt;p&gt;The nice thing about &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;mkosi&lt;/code&gt; is that you can easily and in an automated fashion
create OS images for a number of different distributions. Now that we have
created an image, it is time to run it.&lt;/p&gt;

&lt;h1 id=&quot;running-the-image&quot;&gt;Running the image&lt;/h1&gt;

&lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;systemd-nspawn&lt;/code&gt; can either be invoked via the command line or run as a system
service in the background. In the service mode, each container runs as it’s own
service instance using a provided &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;systemd-nspawn@&lt;/code&gt; unit template. I will first
look at how to invoke it via the command line to get a better understanding
about how it works and then I will use the provided unit template for a more
automated approach.&lt;/p&gt;

&lt;p&gt;There are actually three different ways you can run an image with
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;systemd-nspawn&lt;/code&gt; which all work slightly different. The default way is to boot
the image using it’s init system just like you would boot a VM. It is important
to note here that &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;systemd-nspawn&lt;/code&gt; does &lt;strong&gt;not&lt;/strong&gt; boot a kernel and doesn’t start a
VM. Using the boot mode will provide you with an OS container that is running
multiple processes as well as it’s own init system. You can compare this mode of
operation to LXC containers or BSD jails. To use it, the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;--boot&lt;/code&gt; or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;-b&lt;/code&gt; flag
need to be passed when invoking it. This is the default mode of operation when
using the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;systemd-nspawn@&lt;/code&gt; unit template.&lt;/p&gt;

&lt;div class=&quot;language-shell highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;systemd-nspawn &lt;span class=&quot;nt&quot;&gt;--boot&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-D&lt;/span&gt; /var/lib/machines/buster
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The command above will boot the image and present you with a login shell. If you
followed the steps above to build the image, you can now login with the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;root&lt;/code&gt;
user and password &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;root&lt;/code&gt; to look around a bit. You will notice that the
container shows the same interface names and IP addresses as your host because
network separation was not enabled. Any network service started in this
container or port that will be exposed, will directly be available on the IPs
of the host.&lt;/p&gt;

&lt;p&gt;Instead of full-fleged OS containers, you can also start something more similar
to an application container which you might now from
&lt;a href=&quot;https://www.docker.com/&quot;&gt;Docker&lt;/a&gt; or &lt;a href=&quot;https://coreos.com/rkt/&quot;&gt;RKT&lt;/a&gt;. You can
either start an application directly as PID 1 by passing no extra flag at all
or run a stub init process which will then start the application by passing
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;--as-pid2&lt;/code&gt;. Note that not all applications are suited to run as PID 1 since
they have to meet a few special requirements that the PID 1 process has. For
example they need to reap all processes spawned by it and also implement
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sysvinit&lt;/code&gt; compatible signal handling. Shells are generally able to satisfy
these requirements but for all other applications is recommended to use
the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;--as-pid2&lt;/code&gt; switch.&lt;/p&gt;

&lt;p&gt;To start a shell inside the created image running as PID 2, run the following command:&lt;/p&gt;

&lt;div class=&quot;language-shell highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;systemd-nspawn &lt;span class=&quot;nt&quot;&gt;-a&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-D&lt;/span&gt; /var/lib/machines/buster /bin/bash
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;A big caveat in this mode of operation is, that name resolution does not seem to
work properly (at least I could not get it working). If this is an issue for the
application you want to run I would recommended you to use the boot mode. The
&lt;a href=&quot;https://www.freedesktop.org/software/systemd/man/systemd-nspawn.html&quot;&gt;man page&lt;/a&gt;
has a nice comparison of the three modes&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Switch&lt;/th&gt;
      &lt;th&gt;Explanation&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;Neither –as-pid2 nor –boot specified&lt;/td&gt;
      &lt;td&gt;The passed parameters are interpreted as the command line, which is executed as PID 1 in the container.&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;–as-pid2 specified&lt;/td&gt;
      &lt;td&gt;The passed parameters are interpreted as the command line, which is executed as PID 2 in the container. A stub init process is run as PID 1.&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;–boot specified&lt;/td&gt;
      &lt;td&gt;An init program is automatically searched for and run as PID 1 in the container. The passed parameters are used as invocation parameters for this process.&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;&lt;strong&gt;UPDATE:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The issues with name resolution in some containers can be explained by the way
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;systemd-nspawn&lt;/code&gt; handles the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/etc/resolv.conf&lt;/code&gt; file. It is configured by the
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;--resolv-conf&lt;/code&gt; command line flag:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;If set to “auto” the file is left as it is if private networking is turned on
(see –private-network). Otherwise, if systemd-resolved.service is connectible
its static resolv.conf file is used, and if not the host’s /etc/resolv.conf
file is used. In the latter cases the file is copied if the image is writable,
and bind mounted otherwise. […] Defaults to “auto”.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;To use the same DNS servers in the container as on the host, set it to either
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;copy-host&lt;/code&gt; or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;bind-host&lt;/code&gt;.&lt;/p&gt;

&lt;h2 id=&quot;networking&quot;&gt;Networking&lt;/h2&gt;

&lt;p&gt;In general it can be a good idea to contain the container in a private network
so you don’t have to worry about which ports it exposes unless you explicitly
forward them. To do this, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;systemd-nspawn&lt;/code&gt; offers a variety of options which
differ in complexity. To simply put a container inside it’s own private &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/28&lt;/code&gt;
subnet you have to pass the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;--network-veth&lt;/code&gt; or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;-n&lt;/code&gt; option. This will create a
virtual ethernet link between the container and the host. Inside the container,
it will be available as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;host0&lt;/code&gt; and on the host side it will be named after the
container, prefixed with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ve-&lt;/code&gt;. &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;systemd-networkd&lt;/code&gt; comes with a default
configuration to set up the virtual interface on the host and inside the
container as well if it is enabled and running on both. It also takes care of
setting up DHCP on the link as well as the necessary routing options. A
container with private networking can be started like this:&lt;/p&gt;

&lt;div class=&quot;language-shell highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;systemd-nspawn &lt;span class=&quot;nt&quot;&gt;-bD&lt;/span&gt; /var/lib/machines/Buster &lt;span class=&quot;nt&quot;&gt;-n&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; If you are also using &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;docker&lt;/code&gt; on your system, you have to do some
tweaking of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;iptables&lt;/code&gt; rules so the container can communicate with the outside
world. &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;docker&lt;/code&gt; changes the default behavior of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;iptables&lt;/code&gt; so you have to allow
in- and outgoing traffic on the created virtual interface. Example &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;iptables&lt;/code&gt;
rules can be found in the paragraph below.&lt;/p&gt;

&lt;h1 id=&quot;managing-containers&quot;&gt;Managing containers&lt;/h1&gt;

&lt;p&gt;If you want to run containers via &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;systemd-nspawn&lt;/code&gt; in a more automated and
management friendly fashion which is similar to how you would run &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;docker&lt;/code&gt;
containers, you can make use of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;machinectl&lt;/code&gt; which also ships with systemd. It
uses the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;systemd-nspawn@&lt;/code&gt; unit template mentioned above to start containers
with sensible default settings. Those are:&lt;/p&gt;

&lt;div class=&quot;language-conf highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;ExecStart&lt;/span&gt;=/&lt;span class=&quot;n&quot;&gt;usr&lt;/span&gt;/&lt;span class=&quot;n&quot;&gt;bin&lt;/span&gt;/&lt;span class=&quot;n&quot;&gt;systemd&lt;/span&gt;-&lt;span class=&quot;n&quot;&gt;nspawn&lt;/span&gt; --&lt;span class=&quot;n&quot;&gt;quiet&lt;/span&gt; --&lt;span class=&quot;n&quot;&gt;keep&lt;/span&gt;-&lt;span class=&quot;n&quot;&gt;unit&lt;/span&gt; --&lt;span class=&quot;n&quot;&gt;boot&lt;/span&gt; \
--&lt;span class=&quot;n&quot;&gt;link&lt;/span&gt;-&lt;span class=&quot;n&quot;&gt;journal&lt;/span&gt;=&lt;span class=&quot;n&quot;&gt;try&lt;/span&gt;-&lt;span class=&quot;n&quot;&gt;guest&lt;/span&gt; --&lt;span class=&quot;n&quot;&gt;network&lt;/span&gt;-&lt;span class=&quot;n&quot;&gt;veth&lt;/span&gt; -&lt;span class=&quot;n&quot;&gt;U&lt;/span&gt; \
--&lt;span class=&quot;n&quot;&gt;settings&lt;/span&gt;=&lt;span class=&quot;n&quot;&gt;override&lt;/span&gt; --&lt;span class=&quot;n&quot;&gt;machine&lt;/span&gt;=%&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;I did not cover all of them so make sure to look them up in the man page :-).&lt;/p&gt;

&lt;p&gt;To start a container, you can first have a look at all images that are
available. &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;machinectl&lt;/code&gt; searches images stored in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/var/lib/machines/&lt;/code&gt;,
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/usr/local/lib/machines/&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/usr/lib/machines/&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/var/lib/container/&lt;/code&gt;.&lt;/p&gt;

&lt;div class=&quot;language-shell highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c&quot;&gt;# machinectl list-images &lt;/span&gt;
NAME                                TYPE      RO USAGE CREATED                      MODIFIED                    
buster                              directory no   n/a n/a                          n/a                         

1 images listed.
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Then you can simply run &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;machinectl start buster&lt;/code&gt; and it will invoke the unit
template with the used image.&lt;/p&gt;

&lt;div class=&quot;language-shell highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c&quot;&gt;# machinectl &lt;/span&gt;
MACHINE CLASS     SERVICE        OS     VERSION ADDRESSES     
buster  container systemd-nspawn debian 10      192.168.68.28…

1 machines listed.
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;You can then use &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;machinectl login&lt;/code&gt; or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;machinectl shell&lt;/code&gt; to login to the running
container and do things or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;machinectl status&lt;/code&gt; to check the processes running
inside your container. What is also pretty neat is that you can use &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;journalctl
-u systemd-nspawn@buster&lt;/code&gt; on your host to see all log output of the container.&lt;/p&gt;

&lt;p&gt;As mentioned above, if you are also running &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;docker&lt;/code&gt; on your system, you have to
create a few &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;iptables&lt;/code&gt; rules so your container can talk to the outside when you
run it with private networking enabled. The easiest way to do this, is to create
an override file for the systemd unit template via &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;systemctl edit
systemd-nspawn@&lt;/code&gt; and adding the following content:&lt;/p&gt;

&lt;div class=&quot;language-conf highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;[&lt;span class=&quot;n&quot;&gt;Service&lt;/span&gt;]
&lt;span class=&quot;n&quot;&gt;ExecStartPre&lt;/span&gt;=-/&lt;span class=&quot;n&quot;&gt;usr&lt;/span&gt;/&lt;span class=&quot;n&quot;&gt;bin&lt;/span&gt;/&lt;span class=&quot;n&quot;&gt;iptables&lt;/span&gt; -&lt;span class=&quot;n&quot;&gt;A&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;FORWARD&lt;/span&gt; -&lt;span class=&quot;n&quot;&gt;o&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ve&lt;/span&gt;-%&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; -&lt;span class=&quot;n&quot;&gt;m&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;conntrack&lt;/span&gt; --&lt;span class=&quot;n&quot;&gt;ctstate&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;RELATED&lt;/span&gt;,&lt;span class=&quot;n&quot;&gt;ESTABLISHED&lt;/span&gt; -&lt;span class=&quot;n&quot;&gt;j&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ACCEPT&lt;/span&gt; ; \
-/&lt;span class=&quot;n&quot;&gt;usr&lt;/span&gt;/&lt;span class=&quot;n&quot;&gt;bin&lt;/span&gt;/&lt;span class=&quot;n&quot;&gt;iptables&lt;/span&gt; -&lt;span class=&quot;n&quot;&gt;A&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;FORWARD&lt;/span&gt; -&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ve&lt;/span&gt;-%&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; ! -&lt;span class=&quot;n&quot;&gt;o&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ve&lt;/span&gt;-%&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; -&lt;span class=&quot;n&quot;&gt;j&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ACCEPT&lt;/span&gt; ; \
-/&lt;span class=&quot;n&quot;&gt;usr&lt;/span&gt;/&lt;span class=&quot;n&quot;&gt;bin&lt;/span&gt;/&lt;span class=&quot;n&quot;&gt;iptables&lt;/span&gt; -&lt;span class=&quot;n&quot;&gt;A&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;FORWARD&lt;/span&gt; -&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ve&lt;/span&gt;-%&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; -&lt;span class=&quot;n&quot;&gt;o&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ve&lt;/span&gt;-%&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; -&lt;span class=&quot;n&quot;&gt;j&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ACCEPT&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;ExecStopPost&lt;/span&gt;=-/&lt;span class=&quot;n&quot;&gt;usr&lt;/span&gt;/&lt;span class=&quot;n&quot;&gt;bin&lt;/span&gt;/&lt;span class=&quot;n&quot;&gt;iptables&lt;/span&gt; -&lt;span class=&quot;n&quot;&gt;D&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;FORWARD&lt;/span&gt; -&lt;span class=&quot;n&quot;&gt;o&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ve&lt;/span&gt;-%&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; -&lt;span class=&quot;n&quot;&gt;m&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;conntrack&lt;/span&gt; --&lt;span class=&quot;n&quot;&gt;ctstate&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;RELATED&lt;/span&gt;,&lt;span class=&quot;n&quot;&gt;ESTABLISHED&lt;/span&gt; -&lt;span class=&quot;n&quot;&gt;j&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ACCEPT&lt;/span&gt; ; \
-/&lt;span class=&quot;n&quot;&gt;usr&lt;/span&gt;/&lt;span class=&quot;n&quot;&gt;bin&lt;/span&gt;/&lt;span class=&quot;n&quot;&gt;iptables&lt;/span&gt; -&lt;span class=&quot;n&quot;&gt;D&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;FORWARD&lt;/span&gt; -&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ve&lt;/span&gt;-%&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; ! -&lt;span class=&quot;n&quot;&gt;o&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ve&lt;/span&gt;-%&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; -&lt;span class=&quot;n&quot;&gt;j&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ACCEPT&lt;/span&gt; ; \
-/&lt;span class=&quot;n&quot;&gt;usr&lt;/span&gt;/&lt;span class=&quot;n&quot;&gt;bin&lt;/span&gt;/&lt;span class=&quot;n&quot;&gt;iptables&lt;/span&gt; -&lt;span class=&quot;n&quot;&gt;D&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;FORWARD&lt;/span&gt; -&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ve&lt;/span&gt;-%&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; -&lt;span class=&quot;n&quot;&gt;o&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ve&lt;/span&gt;-%&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; -&lt;span class=&quot;n&quot;&gt;j&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ACCEPT&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;It will invoke &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;iptables&lt;/code&gt; before starting and after stopping the container to
add and delete the necessary rules for the container.&lt;/p&gt;

&lt;h2 id=&quot;configuration-per-container&quot;&gt;Configuration per container&lt;/h2&gt;

&lt;p&gt;If you want to customize the options a container is started with using
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;machinectl&lt;/code&gt; you can create a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;.nspawn&lt;/code&gt; file next to your image with the same
name. On startup it will be parsed by &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;systemd-nspawn&lt;/code&gt; and possibly override the
default settings of the unit template. Have a look at the
&lt;a href=&quot;https://www.freedesktop.org/software/systemd/man/systemd.nspawn.html#&quot;&gt;systemd.nspawn&lt;/a&gt;
man page for the options. To forward port &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;80&lt;/code&gt; of the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;buster&lt;/code&gt; container to port &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;8080&lt;/code&gt;
on the host, you could create the following &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;buster.nspawn&lt;/code&gt; file in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/etc/systemd/nspawn&lt;/code&gt;.
It cannot be put next to the image since some options are privileged and therefore need
to be set inside &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/etc/systemd/nspawn&lt;/code&gt; to be applied. Information about which options are
privileged can also be found inside the man page.&lt;/p&gt;

&lt;div class=&quot;language-conf highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;[&lt;span class=&quot;n&quot;&gt;Network&lt;/span&gt;]
&lt;span class=&quot;n&quot;&gt;Port&lt;/span&gt;=&lt;span class=&quot;m&quot;&gt;8080&lt;/span&gt;:&lt;span class=&quot;m&quot;&gt;80&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;VirtualEthernet&lt;/span&gt;=&lt;span class=&quot;n&quot;&gt;yes&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;After creating the config file and starting the container again, port &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;80&lt;/code&gt; on the container will 
be forwarded to port &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;8080&lt;/code&gt; on your host. It is important to note that &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;systemd-nspwan&lt;/code&gt; will &lt;strong&gt;not&lt;/strong&gt;
forward the port to your loopback interface. So it won’t be available via &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;127.0.0.1:8080&lt;/code&gt; or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;localhost:8080&lt;/code&gt;.
This caused quite some confusion for me :-)&lt;/p&gt;

&lt;p&gt;That’s it for now. I hope I could give you a small and understandable introduction on how to run containers with the
help of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;systemd-nspawn&lt;/code&gt;. I am currently trying to figure out how you could use &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;systemd-nspawn&lt;/code&gt; with existing
workload orchestrators like &lt;a href=&quot;https://nomadproject.io&quot;&gt;HashiCorp Nomad&lt;/a&gt; so stay tuned :-)&lt;/p&gt;

&lt;p&gt;Jan&lt;/p&gt;

</description>
        <pubDate>Sun, 13 Oct 2019 00:00:00 +0000</pubDate>
        <link>/2019-10-13/systemd-nspawn/</link>
        <guid isPermaLink="true">/2019-10-13/systemd-nspawn/</guid>
        
        <category>linux</category>
        
        <category>containers</category>
        
        <category>systemd</category>
        
        <category>nspawn</category>
        
        
      </item>
    
      <item>
        <title>Talk: Evolution of a Microservice Infrastructure</title>
        <description>&lt;p&gt;A feew weeks back, I had the pleasure to give my first talk on a conference. I attended the Open Source
Datacenter Conference (&lt;a href=&quot;https://osdc.de&quot;&gt;OSDC&lt;/a&gt;) in Berlin where I talked about the evolution of our microservice
infrastructure at &lt;a href=&quot;https://www.rewe-digital.com/&quot;&gt;REWE digital&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Since the end of last year, we performed two very big changes to our microservice environment. First we 
changed our reverse proxy from Nginx to &lt;a href=&quot;https://traefik.io&quot;&gt;Traefik&lt;/a&gt; and afterwards, our container orchestrator
from Docker Swarm (standalone) to &lt;a href=&quot;https://nomadproject.io&quot;&gt;Nomad&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;In my talk I gave an overview of how this environment we build looks like and what problems we faced that led us
to change these core parts of our infrastructure.&lt;/p&gt;

&lt;iframe width=&quot;560&quot; height=&quot;315&quot; src=&quot;https://www.youtube-nocookie.com/embed/I3RpW0Lh948&quot; frameborder=&quot;0&quot; allow=&quot;accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture&quot; allowfullscreen=&quot;&quot;&gt;&lt;/iframe&gt;

&lt;p&gt;If you are interesed in the slides, you can find them at &lt;a href=&quot;https://www.slideshare.net/NETWAYS/osdc-2019-evolution-of-a-microserviceinfrastructure-by-jan-martens&quot;&gt;Slideshare&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Jan&lt;/p&gt;
</description>
        <pubDate>Mon, 10 Jun 2019 00:00:00 +0000</pubDate>
        <link>/2019-06-10/talk-osdc/</link>
        <guid isPermaLink="true">/2019-06-10/talk-osdc/</guid>
        
        <category>talk</category>
        
        <category>osdc</category>
        
        <category>microservice</category>
        
        <category>nomad</category>
        
        <category>traefik</category>
        
        <category>consul</category>
        
        <category>video</category>
        
        
      </item>
    
      <item>
        <title>A hypervisor in a box</title>
        <description>&lt;p&gt;Soon after I started to work on my bachelors thesis, I had to figure out a practical
and portable way to run a Xen hypervisor on which I could test my development builds.
Initially I just created a simple Ubuntu virtual machine on my Desktop PC and installed
the latest available version of Xen by hand. Unfortunately at the time Ubuntu only shipped
Xen until version, 4.9 but I needed access to features that were only available in versions
4.10 and later.&lt;/p&gt;

&lt;p&gt;So I switched to Alpine Linux which has a &lt;a href=&quot;https://www.alpinelinux.org/downloads/&quot;&gt;very nice version&lt;/a&gt;
with the latest Xen already installed and set up to work as dom0. This served me well for some time
but things quickly became messy once I started to also develop on my Notebook. Keeping manually
set up VMs in sync across multiple machines turned out to be very difficult and annoying.
I had some prior experience with &lt;a href=&quot;https://www.vagrantup.com/&quot;&gt;Vagrant&lt;/a&gt;, that’s why I decided
to create a Vagrant Box which includes an already configured installation of Xen.&lt;/p&gt;

&lt;p&gt;Vagrant is an open source tool for building and managing virtual
machine environments based on VirtualBox, VMWare, KVM and
many more. It is developed by &lt;a href=&quot;https://www.hashicorp.com/&quot;&gt;HashiCorp&lt;/a&gt; and was first released in
2010.&lt;/p&gt;

&lt;p&gt;With the help of Vagrant, it is possible to create the same exact
virtual machine containing the Xen hypervisor on different computers.
Every branch of my &lt;a href=&quot;https://gitlab.com/JanMa/HermitCore&quot;&gt;HermitCore fork&lt;/a&gt;
contains a &lt;strong&gt;vagrant&lt;/strong&gt; directory, in which
the configuration for the virtual machine is stored. Using the
&lt;strong&gt;Vagrantfile&lt;/strong&gt; inside this directory, it is possible to create a
virtual machine based on the latest Ubuntu version 18.04. Xen and all
it’s dependencies get automatically installed on the creation of the
machine.&lt;/p&gt;

&lt;p&gt;The Vagrantfile for my setup looks like this:&lt;/p&gt;

&lt;div class=&quot;language-ruby highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;no&quot;&gt;Vagrant&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;configure&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;2&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;do&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;config&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;config&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;vm&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;box&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;generic/ubuntu1804&quot;&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;config&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;vm&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;hostname&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;xen&quot;&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;config&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;vm&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;define&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;xen&quot;&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;config&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;vm&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;network&lt;/span&gt; &lt;span class=&quot;ss&quot;&gt;:private_network&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;ss&quot;&gt;:ip&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&amp;gt;&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;192.168.50.11&quot;&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;config&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;vm&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;synced_folder&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;'../'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;'/HermitCore'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;ss&quot;&gt;type: &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;rsync&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;ss&quot;&gt;rsync__exclude: &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;.git/&quot;&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;config&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;vm&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;provider&lt;/span&gt; &lt;span class=&quot;ss&quot;&gt;:libvirt&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;do&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;libvirt&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;libvirt&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;cpus&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;libvirt&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;cpu_mode&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;host-model&quot;&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;libvirt&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;nested&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;kp&quot;&gt;true&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;libvirt&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;memory&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2048&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;libvirt&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;keymap&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;de&quot;&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;end&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;config&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;vm&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;provider&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;virtualbox&quot;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;do&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;vb&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;vb&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;memory&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;2048&quot;&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;vb&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;cpus&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;2&quot;&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;vb&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;customize&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;modifyvm&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;ss&quot;&gt;:id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;--paravirtprovider&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;kvm&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;vb&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;customize&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;modifyvm&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;ss&quot;&gt;:id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;--hwvirtex&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;on&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;vb&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;customize&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;modifyvm&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;ss&quot;&gt;:id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;--nestedpaging&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;on&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;vb&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;customize&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;modifyvm&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;ss&quot;&gt;:id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;--cpu-profile&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;host&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;vb&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;customize&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;modifyvm&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;ss&quot;&gt;:id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;--uart1&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;0x3F8&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;4&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;vb&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;customize&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;modifyvm&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;ss&quot;&gt;:id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;--uartmode1&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;file&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;  &lt;span class=&quot;s2&quot;&gt;&quot;./xen.log&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;end&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;config&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;vm&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;provision&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;shell&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;ss&quot;&gt;inline: &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;lt;-&lt;/span&gt;&lt;span class=&quot;no&quot;&gt;SHELL&lt;/span&gt;&lt;span class=&quot;sh&quot;&gt;
     export DEBIAN_FRONTEND=noninteractive
     apt-get update
     apt-get upgrade -qy
     apt-get install libyajl2 qemu-system-x86 xorriso bridge-utils -qy
     apt-get install /HermitCore/vagrant/files/xen-upstream-4.11.0.deb -qy
     ldconfig
     mv /etc/grub.d/10_linux /etc/grub.d/50_linux
     mv /HermitCore/vagrant/files/grub /etc/default/grub
     mv /HermitCore/vagrant/files/50-vagrant.yaml /etc/netplan/50-vagrant.yaml
     update-grub2
     systemctl enable xen-qemu-dom0-disk-backend.service
     systemctl enable xen-init-dom0.service
     systemctl enable xenconsoled.service
     systemctl enable xendomains.service
     systemctl enable xen-watchdog.service
     echo &quot;Reboot VM!&quot;
     reboot
&lt;/span&gt;&lt;span class=&quot;no&quot;&gt;  SHELL&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;You can choose to either start a VM via &lt;a href=&quot;https://www.virtualbox.org/&quot;&gt;VirtualBox&lt;/a&gt; or &lt;a href=&quot;https://libvirt.org/&quot;&gt;libvirt&lt;/a&gt;.
Sadly VirtualBox has no support for nested virtualization, that’s why I’d recommend to use libvirt as
the default virtualization provider.
To use Vagrant with libvirt, you first have to install the necessary plugin:&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;vagrant plugin install vagrant-libvirt
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Afterwards you can start the VM with:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;vagrant up --provider=libvirt
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This will spin up a Vagrant Box with 2 GiB of memory and two virtual CPUs. After Vagrant created
the VM, the whole parent folder containing the Git repository including the built files of
HermitCore will be copied into the virtual machine and are available
inside the &lt;strong&gt;/HermitCore&lt;/strong&gt; directory. It will also run the inline shell script that is included
in the Vagrantfile. It installs a custom built version of Xen with debug output enabled and reboots
the VM. Afterwards everything is set up and ready to go.&lt;/p&gt;

&lt;p&gt;One very nice thing Vagrant brings with it, is to automatically rsync folders into the running Vagrant Box.
So when I made some changes to HermitCore and built them, I only had to run&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;vagrant rsync
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;from within the &lt;strong&gt;vagrant&lt;/strong&gt; folder and I could start the latest build inside the VM without much effort.&lt;/p&gt;

&lt;p&gt;This setup proved to be very convenient to use, even across multiple systems. I just had to
clone the repo, start the Vagrant Box and could dive right into developing. I hope you found this post interesting
and will also consider to use Vagrant to provide you with a portable development environment for your projects&lt;/p&gt;

&lt;p&gt;So long,
Jan&lt;/p&gt;
</description>
        <pubDate>Sat, 02 Feb 2019 00:00:00 +0000</pubDate>
        <link>/2019-02-02/hypervisor-in-a-box/</link>
        <guid isPermaLink="true">/2019-02-02/hypervisor-in-a-box/</guid>
        
        <category>xen</category>
        
        <category>vagrant</category>
        
        <category>vm</category>
        
        
      </item>
    
      <item>
        <title>Porting an unikernel to Xen: Wrapping up</title>
        <description>&lt;p&gt;Hello again and a happy new year :-)&lt;/p&gt;

&lt;p&gt;This is the last post of my blog series on how to
port an unikernel to Xen. &lt;a href=&quot;/2018-12-16/os-xen-005/&quot;&gt;Last time&lt;/a&gt; we had a
look at the performance of my implementations and how it compared to the original one.
The results looked very promising which led me to the conclusion that Xen is a valid choice when
looking for a platform to run HermitCore on. In this post I will wrap thing up and provide a small
summary about what I have done.&lt;/p&gt;

&lt;p&gt;Over the course of the last posts, I showed how I extended the unikernel HermitCore
in such a way that it can be executed in different ways as a guest in Xen.
On the one hand as a fully virtualized guest whose hardware is
completely emulated by Xen, which makes it possible to run practically
unmodified operating systems. On the other hand as a paravirtualized
guest whose OS has to be modified, but which promised better
performance. &lt;a href=&quot;/2018-11-11/os-xen-001/&quot;&gt;First&lt;/a&gt;,
I gave an insight into Xen to explain the
conceptual changes that need to be made to a paravirtualized guest. I
also provided a short introduction to unikernels and HermitCore.&lt;/p&gt;

&lt;p&gt;The focus of my implementation was &lt;a href=&quot;/2018-11-17/os-xen-002/&quot;&gt;initially&lt;/a&gt;
aimed at the support for
the operation as a so-called HVM guest. I wrote a wrapper script to
create a bootable ISO image from the unmodified HermitCore files and
start it as a fully virtualized guest in Xen. Apart from supporting
multiple CPU cores, this can be considered working completely.&lt;/p&gt;

&lt;p&gt;The &lt;a href=&quot;/2018-11-25/os-xen-003/&quot;&gt;next thing&lt;/a&gt; I tried,
was to modify HermitCore to work as a fully
paravirtualized guest in Xen. I explained the necessary substantial changes at many
crucial places in the source code in detail.
Unfortunately it would have also been necessary to change the memory
management in large parts, in order to fulfill the restrictions demanded
by Xen. Since the implementation of the memory management represented a
bachelor thesis on its own, I chose not to do this.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/2018-12-09/os-xen-004/&quot;&gt;Instead&lt;/a&gt;,
I implemented the necessary changes to run HermitCore as a
hybrid PVH guest under Xen. This had the advantage that no changes to
the memory management were necessary. I could adopt the already made changes to the
source code with slight modifications and HermitCore was
extended so that it is completely functional except for a few small
restrictions.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/2018-12-16/os-xen-005/&quot;&gt;Finally&lt;/a&gt;
I compared the new HermitCore operating modes
with the original modes using the benchmarks already included in HermitCore. It turned
out that the new modes are able to keep up with the original
implementation in terms of performance except for a few minor
differences.&lt;/p&gt;

&lt;p&gt;It would be interesting to see in future work, what would be necessary
to implement the missing features in HermitCore to make it work
completely under Xen. This includes support for multiple CPU cores, a
working network driver and a working console as a PVH guest. It would
also be interesting to see how extensive the changes to the memory
management must be in order to run HermitCore as a completely
paravirtualized guest.&lt;/p&gt;

&lt;p&gt;Finally I can say that Xen is a very interesting additional platform
for HermitCore. The effort required for porting was reasonable and the
performance looks promising. Since Xen is one of the most used
hypervisors in the cloud and HermitCore is also specially developed for
cloud computing, it is certainly a useful addition.&lt;/p&gt;

&lt;p&gt;I hope you enjoyed my blog post series and will be back for my future posts :-)&lt;/p&gt;

&lt;p&gt;Jan&lt;/p&gt;

</description>
        <pubDate>Mon, 07 Jan 2019 00:00:00 +0000</pubDate>
        <link>/2019-01-07/ox-xen-006/</link>
        <guid isPermaLink="true">/2019-01-07/ox-xen-006/</guid>
        
        <category>unikernel</category>
        
        <category>xen</category>
        
        <category>os</category>
        
        <category>hermitcore</category>
        
        
      </item>
    
      <item>
        <title>Porting an unikernel to Xen: Perfomance comparison</title>
        <description>&lt;p&gt;Hello,&lt;/p&gt;

&lt;p&gt;after finishing the implementation part 
&lt;a href=&quot;/2018-12-09/os-xen-004/&quot;&gt;last time&lt;/a&gt;, this week it is time for some 
performance comparison. Although both the HVM and PVH port are not 100% feature complete, they are 
in a state where basic bench marking is very much possible. The results, as you will see, look very promising when comparing them with the ones of the original implementation.&lt;/p&gt;

&lt;h3 id=&quot;table-of-contents&quot;&gt;Table of contents&lt;/h3&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;#basic-benchmark&quot;&gt;Basic Benchmark&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#stream-benchmark&quot;&gt;Stream Benchmark&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#boot-time&quot;&gt;Boot time&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#conclusion&quot;&gt;Conclusion&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;HermitCore includes a number of different benchmarks to determine the
performance. This post compares the results of some of these
benchmarks for HermitCore running in different environments. The
following environments were tested:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;KVM&lt;br /&gt;
HermitCore is started in a virtual machine on Linux using QEMU with
&lt;a href=&quot;https://www.linux-kvm.org/page/Main_Page&quot;&gt;KVM&lt;/a&gt; acceleration&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;QEMU&lt;br /&gt;
A virtual machine on Linux using only QEMU without acceleration&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;HVM&lt;br /&gt;
Running as a fully virtualized guest on Xen&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;PVH&lt;br /&gt;
A hybrid PVH guest running on Xen&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For each test, the virtual machines were given the same amount of
resources. They were started with one virtual CPU core and 512 megabytes
of RAM. All tests were performed on the same machine, a Lenovo Thinkpad
T470 with the following specifications:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;Intel Core i5-7300U CPU with 2.6 GHz and 4 Cores&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;16 gigabytes of RAM&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;256 gigabytes SSD storage&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Running Arch Linux&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;basic-benchmark&quot;&gt;Basic Benchmark&lt;/h2&gt;

&lt;p&gt;First the overhead of a system call and a reschedule was measured. The
basic benchmark invokes the system calls &lt;strong&gt;getpid&lt;/strong&gt; and &lt;strong&gt;sched_yield&lt;/strong&gt;
up to 10.000 times after the cache has been warmed up. It measures how
many CPU cycles the respective calls need on average. &lt;strong&gt;Getpid&lt;/strong&gt; is the
system call with the shortest runtime, it can be used to determine the
general overhead of a system call. &lt;strong&gt;Sched_yield&lt;/strong&gt; checks if another
task is ready to be executed and switches to this task. The benchmark
also checks how long it takes to allocate a megabyte of memory and how
long the first write access to a page table takes.&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th style=&quot;text-align: left&quot;&gt;&lt;strong&gt;System activity&lt;/strong&gt;&lt;/th&gt;
      &lt;th style=&quot;text-align: right&quot;&gt;&lt;strong&gt;KVM&lt;/strong&gt;&lt;/th&gt;
      &lt;th style=&quot;text-align: right&quot;&gt;&lt;strong&gt;QEMU&lt;/strong&gt;&lt;/th&gt;
      &lt;th style=&quot;text-align: right&quot;&gt;&lt;strong&gt;HVM&lt;/strong&gt;&lt;/th&gt;
      &lt;th style=&quot;text-align: right&quot;&gt;&lt;strong&gt;PVH&lt;/strong&gt;&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;getpid&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;9&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;122&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;12&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;12&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;sched_yield&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;79&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;360&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;90&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;83&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;malloc&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;5858&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;51812&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;51311&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;86658&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;write access&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;3368&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;34626&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;42607&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;83368&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;center&gt;Results of the Basic Benchmark&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;It is not surprising that HermitCore running on KVM shows the best
performance overall. It is however interesting to see that a PVH and HVM
guest can almost keep up with it in regards to system call performance.
What is also surprising is that the memory access of a PVH guest is much
slower than that of a HVM guest considering that both their mechanism
for page table management is virtualized in hardware.&lt;/p&gt;

&lt;h2 id=&quot;stream-benchmark&quot;&gt;Stream Benchmark&lt;/h2&gt;

&lt;p&gt;The &lt;strong&gt;Sustainable Memory Bandwidth in Current High Performance
Computers&lt;/strong&gt; &lt;a href=&quot;https://www.cs.virginia.edu/stream/&quot;&gt;STREAM&lt;/a&gt; benchmark is a synthetic test written in
Fortran to measure the performance of four distinct long vector
operations. They represent the elementary operations on which vector
codes are based and are specifically intended to eliminate data re-use.
The results display the sustainable memory bandwidth in megabytes per
second and the corresponding computation time for the four vector
operations.&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th style=&quot;text-align: left&quot;&gt;&lt;strong&gt;Name&lt;/strong&gt;&lt;/th&gt;
      &lt;th style=&quot;text-align: left&quot;&gt;&lt;strong&gt;Function&lt;/strong&gt;&lt;/th&gt;
      &lt;th style=&quot;text-align: center&quot;&gt;&lt;strong&gt;bytes per iteration&lt;/strong&gt;&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;Copy&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;a(i) = b(i)&lt;/code&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: center&quot;&gt;16&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;Scale&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;a(i) = q * b(i)&lt;/code&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: center&quot;&gt;16&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;Sum&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;a(i) = b(i) + c(i)&lt;/code&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: center&quot;&gt;24&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;Triad&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;a(i) = b(i) + q * c(i)&lt;/code&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: center&quot;&gt;24&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;center&gt;Functions used in the STREAM benchmark&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th style=&quot;text-align: left&quot;&gt;&lt;strong&gt;Environment&lt;/strong&gt;&lt;/th&gt;
      &lt;th style=&quot;text-align: left&quot;&gt;&lt;strong&gt;Bandwidth MB/s&lt;/strong&gt;&lt;/th&gt;
      &lt;th style=&quot;text-align: left&quot;&gt;&lt;strong&gt;Avg time&lt;/strong&gt;&lt;/th&gt;
      &lt;th style=&quot;text-align: left&quot;&gt;&lt;strong&gt;Min time&lt;/strong&gt;&lt;/th&gt;
      &lt;th style=&quot;text-align: left&quot;&gt;&lt;strong&gt;Max time&lt;/strong&gt;&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt; &lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt; &lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;Copy&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt; &lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt; &lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;KVM&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;23342.8&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;0.009865&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;0.009596&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;0.014814&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;QEMU&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;5153.7&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;0.045812&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;0.043464&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;0.047941&lt;/strong&gt;&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;HVM&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;24294.7&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;0.009369&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;0.009220&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;0.010860&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;PVH&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;24141.9&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;0.009469&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;0.009278&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;0.012305&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt; &lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt; &lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt; &lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt; &lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt; &lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt; &lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt; &lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;Scale&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt; &lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt; &lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;KVM&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;16556.5&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;0.013793&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;0.013529&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;0.017478&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;QEMU&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;1094.8&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;0.218594&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;0.204610&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;0.229119&lt;/strong&gt;&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;HVM&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;17263.1&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;0.013157&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;0.012976&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;0.015088&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;PVH&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;17189.3&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;0.013252&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;0.013031&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;0.019612&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt; &lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt; &lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt; &lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt; &lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt; &lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt; &lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt; &lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;Add&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt; &lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt; &lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;KVM&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;19264.9&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;0.017679&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;0.017441&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;0.020491&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;QEMU&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;1562.1&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;0.225724&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;0.215092&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;0.237715&lt;/strong&gt;&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;HVM&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;20038.9&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;0.016974&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;0.016767&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;0.018722&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;PVH&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;19955.0&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;0.017068&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;0.016838&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;0.022021&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt; &lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt; &lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt; &lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt; &lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt; &lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt; &lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt; &lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;Triad&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt; &lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt; &lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;KVM&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;19088.8&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;0.017928&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;0.017602&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;0.021669&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;QEMU&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;897.4&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;0.394932&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;0.374413&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;0.415066&lt;/strong&gt;&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;HVM&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;19856.3&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;0.017111&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;0.016922&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;0.018756&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;PVH&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;19772.2&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;0.017232&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;0.016994&lt;/td&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;0.021154&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;center&gt;Results of the STREAM benchmark&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;It is very surprising to see, that a HVM guest running in Xen
outperforms all others in terms of possible memory bandwidth and
corresponding computing time. Especially if considered that a HVM guest
in the testing environment is actually a virtual machine running inside
a virtual machine (the Xen hypervisor) running on top of Linux. Although
a KVM and PVH VM achieve almost the same results with only one to four
percent deviation. You can also see that the virtualization purely based
on QEMU seems to be rather inefficient and slow.&lt;/p&gt;

&lt;h2 id=&quot;boot-time&quot;&gt;Boot time&lt;/h2&gt;

&lt;p&gt;At last the time needed by the VMs to boot was compared. The included
&lt;strong&gt;hello world&lt;/strong&gt; test was run in all environments and the reported boot
time was noted. This time is how long it takes HermitCore until it is
able to start the &lt;strong&gt;hello world&lt;/strong&gt; application.&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th style=&quot;text-align: left&quot;&gt;&lt;strong&gt;Environment&lt;/strong&gt;&lt;/th&gt;
      &lt;th style=&quot;text-align: right&quot;&gt;&lt;strong&gt;Time in ms&lt;/strong&gt;&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;KVM&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;80&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;QEMU&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;60&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;HVM&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;6140&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td style=&quot;text-align: left&quot;&gt;&lt;strong&gt;PVH&lt;/strong&gt;&lt;/td&gt;
      &lt;td style=&quot;text-align: right&quot;&gt;80&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;center&gt;Boot time for the different Environments&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;HermitCore takes about the same time for it in all environments. The
only exception is a HVM guest. It takes this guest very long in
comparison to detect and start the emulated devices it is provided by
Xen, which results in a about 80 times slower boot time.&lt;/p&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;The results of the previous benchmarks show that when running as either
a HVM or PVH guest in Xen, HermitCore is definitely able to perform as
well as a the original implementation running as a KVM accelerated
virtual machine in QEMU. There are some small exceptions, notably the
very slow boot time of a HVM guest in comparison to all others and the
long memory access times of a PVH and HVM guest in terms of CPU cycles,
but the overall results are very similar.&lt;/p&gt;

&lt;p&gt;I hope you will join me again &lt;a href=&quot;/2019-01-07/ox-xen-006/&quot;&gt;next time&lt;/a&gt; for the last post of this series. There 
i will provide a short summary and some possible outlooks into the future.&lt;/p&gt;

&lt;p&gt;So long Jan&lt;/p&gt;
</description>
        <pubDate>Sun, 16 Dec 2018 00:00:00 +0000</pubDate>
        <link>/2018-12-16/os-xen-005/</link>
        <guid isPermaLink="true">/2018-12-16/os-xen-005/</guid>
        
        <category>unikernel</category>
        
        <category>xen</category>
        
        <category>os</category>
        
        <category>hermitcore</category>
        
        
      </item>
    
      <item>
        <title>Porting an unikernel to Xen: PVH guest</title>
        <description>&lt;p&gt;Welcome back to the fourth part of my blog post series. 
&lt;a href=&quot;/2018-11-25/os-xen-003/&quot;&gt;Last time&lt;/a&gt; i wrote
about my process of trying to get HermitCore working as a fully paravirtualized PV
guest in Xen. Unfortunately this didn’t work out for a number of reasons. The main one
being hat i would have had to rewrite the memory management code of HermitCore. So i
decided to try to get it working as a PVH guest. This worked out quite nice as you will 
read in this post. Once again, all source code discussed here can be found in a &lt;a href=&quot;https://gitlab.com/JanMa/HermitCore/tree/xen_pvh&quot;&gt;separate 
branch&lt;/a&gt; on GitLab.&lt;/p&gt;

&lt;h3 id=&quot;table-of-contents&quot;&gt;Table of contents&lt;/h3&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;#booting&quot;&gt;Booting&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#initializing-xen-specific-features&quot;&gt;Initializing Xen features&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#creating-a-multiboot-information-struct&quot;&gt;Creating multiboot information&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#console-output&quot;&gt;Console output&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#completing-startup&quot;&gt;Completing startup&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#conclusion&quot;&gt;Conclusion&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The third guest type i discussed in my thesis is called a &lt;strong&gt;PVH&lt;/strong&gt; guest.
It can be viewed as a kind of hybrid between a &lt;strong&gt;HVM&lt;/strong&gt; and a &lt;strong&gt;PV&lt;/strong&gt;
guest.&lt;/p&gt;

&lt;p&gt;They work almost exactly as PV guests with one major exception: a PVH
guest runs in ring 0 and has direct control over it’s page tables. This
has several advantages. One of the biggest efforts when porting an
operating system to Xen is the page table management code. As mentioned
in the previous post, a lot of effort would have been needed to
modify the existing page table and memory management code to work with
the restrictions placed on a PV guest by Xen. In contrast, there is no
special code needed for a PVH guest. It does need some modifications to
work correctly with the Xen hypervisor but by far not as many as a PV
guest.&lt;/p&gt;

&lt;h3 id=&quot;booting&quot;&gt;Booting&lt;/h3&gt;

&lt;p&gt;When starting a PVH guest, Xen defines the following register state:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;ebx&lt;/strong&gt;&lt;br /&gt;
contains the physical memory address where the loader has placed the
boot start info structure.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;cr0&lt;/strong&gt;&lt;br /&gt;
bit 0 (PE) must be set. All the other writable bits are cleared.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;cr4&lt;/strong&gt;&lt;br /&gt;
all bits are cleared.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;cs&lt;/strong&gt;&lt;br /&gt;
must be a 32 bit read/execute code segment with a base of ‘0’ and a
limit of ‘0xFFFFFFFF’. The selector value is unspecified.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;ds&lt;/strong&gt;, &lt;strong&gt;es&lt;/strong&gt;&lt;br /&gt;
must be a 32 bit read/write data segment with a base of ‘0’ and a
limit of ‘0xFFFFFFFF’. The selector values are all unspecified.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;tr&lt;/strong&gt;&lt;br /&gt;
must be a 32 bit TSS (active) with a base of ’0’ and a limit of
’0x67’.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;eflags&lt;/strong&gt;&lt;br /&gt;
bit 17 (VM) must be cleared. Bit 9 (IF) must be cleared. Bit 8 (TF)
must be cleared. Other bits are all unspecified.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All other processor registers and flag bits are unspecified. The OS is
in charge of setting up it’s own stack, GDT and IDT. Xen starts the PVH
guest in 32 bit mode with paging enabled, so the guest also has to
provide a 32 bit entry point to Xen with the help of an ELF note. The
domain builder will jump directly to the specified address in the boot
code.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;ELFNOTE &quot;Xen&quot;,XEN_ELFNOTE_PHYS32_ENTRY,start
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;When HermitCore is not booting directly into 64 bit mode, it first has
to run the included loader. Therefore all changes in the boot code have
to be implemented in the 
&lt;a href=&quot;https://gitlab.com/JanMa/HermitCore/blob/xen_pvh/arch/x86/loader/entry.asm&quot;&gt;&lt;strong&gt;entry.asm&lt;/strong&gt;&lt;/a&gt; 
file of the loader.&lt;/p&gt;

&lt;p&gt;The address of the start info struct gets passed to the guest in the
&lt;strong&gt;ebx&lt;/strong&gt; register instead of the multiboot information. This has to be
saved. In addition only the mentioned ELF note has to be added. The
resulting changes in &lt;strong&gt;entry.asm&lt;/strong&gt; are therefore very minimal.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SECTION .mboot
global start
start:
    cli ; avoid any interrupt
    jmp stublet
...
SECTION .text
ALIGN 4
stublet:
    ; Initialize stack pointer
    mov esp, boot_stack
    add esp, KERNEL_STACK_SIZE - 16

    ; Safe Xen start info
    mov DWORD [xen_start_info], ebx
...
; jump to the boot processor's C code
extern main
jmp main
jmp $

align 4096
global shared_info, hypercall_page
shared_info:
    times 512 DQ 0
hypercall_page:
    times 512 DQ 0

SECTION .data

global mb_info, xen_start_info
ALIGN 8
mb_info:
    DQ 0
xen_start_info:
    DQ 0
    
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;A special &lt;strong&gt;hypercall_page&lt;/strong&gt; and &lt;strong&gt;shared_info&lt;/strong&gt; page have been added
again as described in the previous post. Similar to a PV guest, the first
thing a PVH guest has to set up are hypercalls and the &lt;strong&gt;shared_info&lt;/strong&gt;
page.&lt;/p&gt;

&lt;h3 id=&quot;initializing-xen-specific-features&quot;&gt;Initializing Xen specific features&lt;/h3&gt;

&lt;p&gt;Setting these up works a little different than in a purely
paravirtualized guest. Hypercalls have to be specifically enabled and
mapping the &lt;strong&gt;shared_info&lt;/strong&gt; page requires a different hypercall. In
addition, a &lt;strong&gt;hvm_start_info&lt;/strong&gt; struct is passed to the guest instead
of the &lt;strong&gt;start_info_t&lt;/strong&gt; struct described previously.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/hvm_start_info.png&quot; alt=&quot;hvm_start_info&quot; /&gt;&lt;/p&gt;

&lt;p&gt;The values in this struct are almost completely different to the values
of the &lt;strong&gt;start_info_t&lt;/strong&gt; struct but are nevertheless very important for
the following steps in the boot process.&lt;/p&gt;

&lt;h4 id=&quot;hypercalls&quot;&gt;Hypercalls&lt;/h4&gt;

&lt;p&gt;As opposed to a PV guest, a PVH guest has to enable hypercalls before it
can use them. The following code has been taken from Mini-OS.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;#define XEN_CPUID_FIRST_LEAF 0x40000000
/*
 * Leaf 1 (0x40000x00)
 * EAX: Largest Xen-information leaf. All leaves up to an including @EAX
 *      are supported by the Xen host.
 * EBX-EDX: &quot;XenVMMXenVMM&quot; signature, allowing positive identification
 *      of a Xen host.
 */
#define XEN_CPUID_SIGNATURE_EBX 0x566e6558 /* &quot;XenV&quot; */
#define XEN_CPUID_SIGNATURE_ECX 0x65584d4d /* &quot;MMXe&quot; */
#define XEN_CPUID_SIGNATURE_EDX 0x4d4d566e /* &quot;nVMM&quot; */

static inline void wrmsrl(unsigned msr, uint64_t val)
{
    wrmsr(msr, (uint32_t)(val &amp;amp; 0xffffffffULL), (uint32_t)(val &amp;gt;&amp;gt; 32));
}

static void hpc_init(void)
{
    uint32_t eax, ebx, ecx, edx, base;

    for ( base = XEN_CPUID_FIRST_LEAF;
          base &amp;lt; XEN_CPUID_FIRST_LEAF + 0x10000; base += 0x100 )
    {
        cpuid(base, &amp;amp;eax, &amp;amp;ebx, &amp;amp;ecx, &amp;amp;edx);

        if ( (ebx == XEN_CPUID_SIGNATURE_EBX) &amp;amp;&amp;amp;
             (ecx == XEN_CPUID_SIGNATURE_ECX) &amp;amp;&amp;amp;
             (edx == XEN_CPUID_SIGNATURE_EDX) &amp;amp;&amp;amp;
             ((eax - base) &amp;gt;= 2) )
            break;
    }

    cpuid(base + 2, &amp;amp;eax, &amp;amp;ebx, &amp;amp;ecx, &amp;amp;edx);
    wrmsrl(ebx, (unsigned long)&amp;amp;hypercall_page);
    barrier();
}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;To enable hypercalls, the guest first has to issue &lt;a href=&quot;https://wiki.osdev.org/CPUID&quot;&gt;&lt;strong&gt;cpuid&lt;/strong&gt;&lt;/a&gt;
commands, until the register &lt;strong&gt;ebx, ecx&lt;/strong&gt; and &lt;strong&gt;edx&lt;/strong&gt; contain the
&lt;strong&gt;XEN_CPUID_SIGNATURE&lt;/strong&gt; values defined in the above listing. Then it
can issue a &lt;strong&gt;wrmsr&lt;/strong&gt; command to tell the hypervisor the address of it’s
desired &lt;strong&gt;hypercall_page&lt;/strong&gt;.&lt;/p&gt;

&lt;h4 id=&quot;shared-info-page&quot;&gt;Shared info page&lt;/h4&gt;

&lt;p&gt;A PVH guest doesn’t have access to the same hypercalls a PV guest has.
To map the &lt;strong&gt;shared_info&lt;/strong&gt; page the &lt;strong&gt;HYPERVISOR_memory_op&lt;/strong&gt;
hypercall has to be used. It sets the page frame number (PFN) at which a
specific page should appear in the guest’s address space.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;shared_info_t *map_shared_info(void *p)
{
    struct xen_add_to_physmap xatp;

    xatp.domid = DOMID_SELF;
    xatp.idx = 0;
    xatp.space = XENMAPSPACE_shared_info;
    xatp.gpfn = PFN_DOWN((size_t)&amp;amp;shared_info);
    if ( HYPERVISOR_memory_op(XENMEM_add_to_physmap, &amp;amp;xatp) != 0 )
        asm volatile (&quot;hlt&quot;);

    return &amp;amp;shared_info;
}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The hypercall takes a &lt;strong&gt;xen_add_to_physmap&lt;/strong&gt; struct and an operation
as arguments which are defined in the 
&lt;a href=&quot;https://gitlab.com/JanMa/HermitCore/blob/xen_pvh/include/xen/memory.h&quot;&gt;&lt;strong&gt;memory.h&lt;/strong&gt;&lt;/a&gt; 
header file.&lt;/p&gt;

&lt;h3 id=&quot;creating-a-multiboot-information-struct&quot;&gt;Creating a multiboot information struct&lt;/h3&gt;

&lt;p&gt;As mentioned previously, HermitCore always expects a
&lt;strong&gt;multiboot_info_t&lt;/strong&gt; struct in the &lt;strong&gt;ebx&lt;/strong&gt; register when booting. It
contains a lot of important information about the environment HermitCore
is running in and is needed in the whole startup process. Considering
that Xen does not pass such a struct to the guest on startup, there are
two possible ways to obtain the needed information.&lt;/p&gt;

&lt;p&gt;One would be to rewrite the parts of the code that rely on the
&lt;strong&gt;multiboot_info_t&lt;/strong&gt; struct, to use information provided by Xen
instead. This would have to be done for the loader and the kernel and
would take a lot of changes. The second way is to simply gather all the
information needed once at startup and create a &lt;strong&gt;multiboot_info_t&lt;/strong&gt;
struct, which then can be passed to the code that needs it. In view of
it’s simpler implementation, i chose the second method and implemented a 
&lt;strong&gt;build_multiboot()&lt;/strong&gt; function in the loader’s 
&lt;a href=&quot;https://gitlab.com/JanMa/HermitCore/blob/xen_pvh/arch/x86/loader/main.c&quot;&gt;&lt;strong&gt;main.c&lt;/strong&gt;&lt;/a&gt; file.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/multiboot_info_t.png&quot; alt=&quot;multiboot_info_t&quot; /&gt;&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;multiboot_info_t&lt;/strong&gt; struct is a rather large and complex
structure which consists out of many variables and pointers to other
structures. It’s structure can be seen above. Fortunately,
HermitCore does not need all of the information which could be included
in this struct. The &lt;strong&gt;build_multiboot&lt;/strong&gt; function only has to create a
&lt;strong&gt;multiboot_memory_map_t&lt;/strong&gt; struct, a &lt;strong&gt;multiboot_module_t&lt;/strong&gt; struct
and set the correct &lt;strong&gt;flags&lt;/strong&gt;. In addition to that, the function also
calculates the CPU frequency with the help of the &lt;strong&gt;shared_info&lt;/strong&gt; page.&lt;/p&gt;

&lt;h4 id=&quot;flags&quot;&gt;Flags&lt;/h4&gt;

&lt;p&gt;The following flags need to be set for HermitCore to work properly:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;MULTIBOOT_INFO_MEMORY&lt;/strong&gt;&lt;br /&gt;
Information about the available memory is provided&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;MULTIBOOT_INFO_CMDLINE&lt;/strong&gt;&lt;br /&gt;
Extra command line options are defined.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;MULTIBOOT_INFO_MODS&lt;/strong&gt;&lt;br /&gt;
There are modules passed to the operating system&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;MULTIBOOT_INFO_MEM_MAP&lt;/strong&gt;&lt;br /&gt;
A full memory map is provided&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To set these flags, the bits 1, 3, 4 and 7 have to be set to one in the
&lt;strong&gt;flags&lt;/strong&gt; variable of the &lt;strong&gt;multiboot_info_t&lt;/strong&gt; struct&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;mb_tmp.flags = 0x00000001 | 0x00000004 | 0x00000008 | 0x00000040;    
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;It is not needed to pass additional command line options to the kernel
so the &lt;strong&gt;cmdline&lt;/strong&gt; variable can be set to zero.&lt;/p&gt;

&lt;h4 id=&quot;memory-map&quot;&gt;Memory map&lt;/h4&gt;

&lt;p&gt;To create a &lt;strong&gt;multiboot_memory_map_t&lt;/strong&gt; struct the guest first has to
issue the &lt;strong&gt;HYPERVISOR_memory_op&lt;/strong&gt; hypercall to get the memory mapping
from Xen. The hypercall returns a &lt;strong&gt;e820entry&lt;/strong&gt; struct which has to be
translated into a &lt;strong&gt;multiboot_memory_map_t&lt;/strong&gt; struct.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;/* PC BIOS standard E820 types and structure. */
#define E820_RAM          1
#define E820_RESERVED     2
#define E820_ACPI         3
#define E820_NVS          4
#define E820_UNUSABLE     5
#define E820_PMEM         7
#define E820_TYPES        8

struct __attribute__((__packed__)) e820entry {
    uint64_t addr;
    uint64_t size;
    uint32_t type;
};
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Translating it is done in the following way:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;//get memory map from Xen
struct xen_memory_map mmap; 
mmap.nr_entries = E820_MAX;
mmap.buffer = e820_map;
int rc = HYPERVISOR_memory_op(XENMEM_memory_map, &amp;amp;mmap);
if (rc){
	kprintf(&quot;Getting mmap failed!\n&quot;);
	HALT;	
}
kprintf(&quot;Memmap nr_entries: %d\n&quot;, mmap.nr_entries);
for ( int i = 0; i &amp;lt; mmap.nr_entries ; i++){
	kprintf(&quot;size: %ld addr: %lx type: %d\n&quot;, e820_map[i].size, e820_map[i].addr, e820_map[i].type);
	mboot_mmap[i].len = e820_map[i].size;
	mboot_mmap[i].addr = e820_map[i].addr;
	mboot_mmap[i].type = e820_map[i].type;
	mboot_mmap[i].size = sizeof(multiboot_memory_map_t)-sizeof(uint32_t);
}
mb_tmp.mmap_addr = (multiboot_uint32_t)&amp;amp;mboot_mmap;
mb_tmp.mmap_length = mmap.nr_entries * sizeof(multiboot_memory_map_t);
kprintf(&quot;mmap: 0x%lx mmap_length: 0x%lx\n&quot;, mb_tmp.mmap_addr, mb_tmp.mmap_length);    
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h4 id=&quot;multiboot-module-information&quot;&gt;Multiboot module information&lt;/h4&gt;

&lt;p&gt;When starting HermitCore as a PVH guest, Xen will first load the binary
file of the HermitCore loader and pass the actual application that is
intended to run as a &lt;strong&gt;ram disk&lt;/strong&gt;. This is very similar to a HVM
guest, where the application gets passed as a multiboot module directly.
The virtual memory layout of a PVH guest is almost the same as for a PV
guest, which is described 
&lt;a href=&quot;/2018-11-25/os-xen-003/#modifying-the-assembly-boot-code&quot;&gt;here&lt;/a&gt;. 
The initial ram disk is
located just after the relocated kernel image, which means that the ram
disk starts at the beginning of the first page after the HermitCore
loader. The correct start address can be determined by using the
provided &lt;strong&gt;kernel_end&lt;/strong&gt; variable, which contains the address where the
loader ends, and calculating the start address of the next page from
there.&lt;/p&gt;

&lt;p&gt;The size of the ram disk can be calculated in a similar matter. The
&lt;strong&gt;magic&lt;/strong&gt; variable inside the &lt;strong&gt;hvm_start_info_t&lt;/strong&gt; struct contains
the address where the struct is located. Subtracting the start address
of the ram disk from this address provides the size of the ram disk.
This works because a PVH guest is not passed a list of allocated page
frames so the &lt;strong&gt;hvm_start_info_t&lt;/strong&gt; struct is located just behind the
initial ram disk.&lt;/p&gt;

&lt;p&gt;These two values are all that is needed by the &lt;strong&gt;build_multiboot&lt;/strong&gt;
function to create a &lt;strong&gt;multiboot_module_t&lt;/strong&gt; struct.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;// calculate the next page frame
uint64_t mod_addr = pfn_to_virt(PFN_UP((uint64_t)&amp;amp;kernel_end));
// calculate the size of the ram disk
uint64_t mod_len = (uint64_t)start_info_ptr-&amp;gt;magic - mod_addr;
kprintf(&quot;module0 addr: 0x%lx\n&quot;, mod_addr);
kprintf(&quot;module0 len: 0x%lx\n&quot;, mod_len);
mmod[0].mod_start = mod_addr;
mmod[0].mod_end = mod_addr + mod_len;
mmod[0].cmdline = 0;
mmod[0].pad = 0;
mb_tmp.mods_count = 1;
mb_tmp.mods_addr = (multiboot_uint32_t)&amp;amp;mmod;    
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h4 id=&quot;cpu-frequency&quot;&gt;CPU frequency&lt;/h4&gt;

&lt;p&gt;The last thing the &lt;strong&gt;build_multiboot&lt;/strong&gt; function has to do, is to
calculate the CPU frequency. Strictly speaking, this is not part of the
&lt;strong&gt;multiboot_info_t&lt;/strong&gt; struct but is nevertheless needed by HermitCore.
Determining it at this point in the startup process is very convenient
so that is why it is included into the &lt;strong&gt;build_multiboot&lt;/strong&gt; function.&lt;/p&gt;

&lt;p&gt;The equation to calculate the CPU frequency is very similar to the one
used to calculate the current system time in a PV guest.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;frequency=((10^9 &amp;lt;&amp;lt; 32) ÷ tsc_to_system_mul) &amp;gt;&amp;gt; |tsc_shift|
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The values needed can be gathered from the &lt;strong&gt;shared_info_t&lt;/strong&gt; struct.&lt;/p&gt;

&lt;h3 id=&quot;console-output&quot;&gt;Console output&lt;/h3&gt;

&lt;p&gt;In theory it should have been possible to make a PVH HermitCore guest
use the same virtual console a PV guest uses. Mapping the required
shared pages and initializing events to communicate with the control
domain works almost the same way. The necessary functions to initialize
the console device and write data to it are also included in the
&lt;strong&gt;xen_pvh&lt;/strong&gt; branch but are not used. The reason for this is because
when trying to write output to the virtual console, the data gets
written correctly into the mapped shared memory page but the control
domain is not notified and therefore displays no output.&lt;/p&gt;

&lt;p&gt;Fortunately Xen provides a second way to display the output of a guest.
There exists a &lt;strong&gt;HYPERVISOR_console_io&lt;/strong&gt; hypercall which can be used
to print data to an emergency debug console. To have access to this
debug console, Xen has to be compiled from source with debug support
enabled. I will provide a short description on how to do this in a 
following post.&lt;/p&gt;

&lt;p&gt;Writing data to the emergency console is very easy. The
&lt;strong&gt;HYPERVISOR_console_io&lt;/strong&gt; hypercall takes three arguments to work.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;HYPERVISOR_console_io(CONSOLEIO_write, strlen(buf), buf);
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The first instructs the hypervisor to write data to the console. The
second and third argument contain the string to be written and it’s
length. For a more convenient usage i implemented a &lt;strong&gt;printf&lt;/strong&gt;-like function
&lt;strong&gt;xprintk&lt;/strong&gt; in the same way as for a PV guest.&lt;/p&gt;

&lt;p&gt;To read the data from the emergency console, the &lt;strong&gt;xl dmesg&lt;/strong&gt; command
has to be used inside the control domain, it will print all data written
to the emergency console.&lt;/p&gt;

&lt;h3 id=&quot;completing-startup&quot;&gt;Completing startup&lt;/h3&gt;

&lt;p&gt;With the above features implemented, the HermitCore loader is able to
complete it’s startup process and load the application binary. The
kernel code doesn’t need much additional changes. After starting the
application binary, hypercalls have to be enabled again to provide the
possibility of console output and the &lt;strong&gt;shared_info&lt;/strong&gt; page has to be
mapped again. This works exactly the same way as in the loader. In
addition to that, the detection of &lt;strong&gt;PCI&lt;/strong&gt; and &lt;strong&gt;UART&lt;/strong&gt;
devices and the initialization of the network were disabled.&lt;/p&gt;

&lt;p&gt;A PVH guest does not have access to any emulated hardware devices,
including PCI and UART devices. So trying to initialize them only takes
unnecessary time in the boot process (since none cane be found). This is
the same reason why the network initialization is disabled. At the time
of writing it is not possible to use networking when running HermitCore
as a PVH guest on Xen. The drivers necessary to make it work, need to be
ported from Mini-OS or Linux to HermitCore.&lt;/p&gt;

&lt;h4 id=&quot;console-output-from-an-application&quot;&gt;Console output from an application&lt;/h4&gt;

&lt;p&gt;To enable console output from applications to the emergency console, a
small trick has been used. Normally, &lt;strong&gt;C&lt;/strong&gt;-programs use &lt;strong&gt;printf&lt;/strong&gt; or
similar functions to display text output inside a terminal window. When
they are compiled into HermitCore, they get linked against HermitCore’s
included &lt;strong&gt;C&lt;/strong&gt; library &lt;a href=&quot;https://github.com/hermitcore/newlib&quot;&gt;&lt;strong&gt;newlib&lt;/strong&gt;&lt;/a&gt;. 
Inside this library, the
function &lt;strong&gt;putchar&lt;/strong&gt;, which is used by &lt;strong&gt;printf&lt;/strong&gt; and similar functions
to write a character to the standard output, is implemented in a way
that each character that would be written to the terminal, gets written
to the UART device instead. This is done by calling the
&lt;a href=&quot;https://gitlab.com/JanMa/HermitCore/blob/xen_pvh/arch/x86/kernel/uart.c#L123&quot;&gt;&lt;strong&gt;write_to_uart&lt;/strong&gt;&lt;/a&gt;
function which is included in HermitCore. By
modifying this function to also write the passed characters to the Xen
emergency console, &lt;strong&gt;C&lt;/strong&gt;-programs can still use the standard functions
to print text.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;static inline void write_to_uart(uint32_t off, unsigned char c)
{
    while (is_transmit_empty() == 0) { PAUSE; }

    if (uartport)
        outportb(uartport + off, c);

    // also write the output to Xen's emergency console
    xprintk(&quot;%c&quot;, c);
}   
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h3&gt;

&lt;p&gt;A HermitCore PVH guest is working almost completely. There are only a
few things missing that have to be implemented in the future, to
consider it working 100 %. These include:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;a working virtual console&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;working networking&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;support for multiple CPUs&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The performance of a PVH guest is astonishing. It boots as fast as a
HermitCore guest running on the &lt;strong&gt;KVM&lt;/strong&gt; hypervisor even though it is a
virtual machine, running inside a virtual machine running on Linux. The
decision to move the focus of the implementation away from a purely
paravirtualized PV guest to a hybrid PVH guest has clearly paid off. 
I will provide a more detailed performance comparison in the next post.
I hope i kept it interesting and that you will be back &lt;a href=&quot;/2018-12-16/os-xen-005/&quot;&gt;next time&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Until then, Jan&lt;/p&gt;

&lt;h3 id=&quot;further-reading&quot;&gt;Further reading&lt;/h3&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://xenbits.xen.org/docs/4.5-testing/misc/pvh.html&quot;&gt;PVH specification&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.informit.com/articles/article.aspx?p=2233978&quot;&gt;David Chisnall - Xen PVH&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
        <pubDate>Sun, 09 Dec 2018 00:00:00 +0000</pubDate>
        <link>/2018-12-09/os-xen-004/</link>
        <guid isPermaLink="true">/2018-12-09/os-xen-004/</guid>
        
        <category>unikernel</category>
        
        <category>xen</category>
        
        <category>os</category>
        
        <category>hermitcore</category>
        
        
      </item>
    
      <item>
        <title>Porting an unikernel to Xen: PV guest</title>
        <description>&lt;p&gt;Welcome back once again to my blog post series on how to port an unikernel 
to Xen. &lt;a href=&quot;/2018-11-17/os-xen-002/&quot;&gt;Last week&lt;/a&gt; 
i showed you how i got HermitCore running as a fully virtualized guest in Xen. 
This week it is finally time to get our hands dirty and to start modifying 
the operating system :-) . The source code for this post can be found in 
the &lt;a href=&quot;https://gitlab.com/JanMa/HermitCore/tree/xen_parav&quot;&gt;xen_parav&lt;/a&gt; branch of my HermitCore fork.&lt;/p&gt;

&lt;h3 id=&quot;table-of-contents&quot;&gt;Table of contents&lt;/h3&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;#elf-notes&quot;&gt;ELF notes&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#modifying-the-assembly-boot-code&quot;&gt;Modifying the assembly boot code&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#initializing-xen-specific-features&quot;&gt;Initializing XEN specific features&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#time-keeping&quot;&gt;Time keeping&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#events&quot;&gt;Events&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#console-output&quot;&gt;Console output&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#interrupts&quot;&gt;Interrupts&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#memory-management&quot;&gt;Memory Management&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#conclusion&quot;&gt;Conclusion&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To start HermitCore as a paravirtualized guest, changes have to be made
to the boot code so it can be loaded by the generic ELF loader included
in Xen. Since Xen boots paravirtualized guests directly into &lt;strong&gt;64 bit&lt;/strong&gt;
mode, no changes have to be implemented in the &lt;strong&gt;ldhermit.elf&lt;/strong&gt; loader
binary and the compiled application binary can be started directly.&lt;/p&gt;

&lt;h3 id=&quot;elf-notes&quot;&gt;ELF notes&lt;/h3&gt;

&lt;p&gt;First, the standard Xen ELF notes have to be included, allowing the
binary to be loaded by the Xen toolstack domain builder. When the guest
is started, the application binary is read and the &lt;strong&gt;ELF PT_NOTE&lt;/strong&gt;
program header is parsed. The hypervisor looks in the &lt;strong&gt;.note&lt;/strong&gt; sections
of the ELF file for the “Xen” notes. The description fields are Xen
specific and contain the required information to find out where the
kernel expects its virtual base address, what type of hypervisor it can
work with, certain features the kernel image can support and the
location of the hypercall page, etc. All ELF note elements have the
same basic structure:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/elf_note_structure.png&quot; alt=&quot;ELF note structure&quot; /&gt;&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;Name Size&lt;/strong&gt; and &lt;strong&gt;Desc Size&lt;/strong&gt; fields are integers, which specify
the size of the &lt;strong&gt;Name&lt;/strong&gt; and &lt;strong&gt;Desc&lt;/strong&gt; fields (excluding padding). The
&lt;strong&gt;Name&lt;/strong&gt; field specifies the vendor who defined the format of the Note.
Typically, vendors use names which are related to their project and/or
company names. For instance, the GNU Project uses &lt;strong&gt;GNU&lt;/strong&gt; as its name.
The &lt;strong&gt;Type&lt;/strong&gt; field is vendor specific, but it is usually treated as an
integer which identifies the type of the note. The &lt;strong&gt;Desc&lt;/strong&gt; field is
also vendor specific, and usually contains data which depends on the
note type.&lt;/p&gt;

&lt;p&gt;Xen defines the following types for ELF notes:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;XEN_ELFNOTE_INFO
XEN_ELFNOTE_ENTRY
XEN_ELFNOTE_HYPERCALL_PAGE
XEN_ELFNOTE_VIRT_BASE
XEN_ELFNOTE_PADDR_OFFSET
XEN_ELFNOTE_XEN_VERSION
XEN_ELFNOTE_GUEST_OS
XEN_ELFNOTE_GUEST_VERSION
XEN_ELFNOTE_LOADER
XEN_ELFNOTE_PAE_MODE
XEN_ELFNOTE_FEATURES
XEN_ELFNOTE_BSD_SYMTAB
XEN_ELFNOTE_HV_START_LOW
XEN_ELFNOTE_L1_MFN_VALID
XEN_ELFNOTE_SUSPEND_CANCEL
XEN_ELFNOTE_INIT_P2M
XEN_ELFNOTE_MOD_START_PFN
XEN_ELFNOTE_SUPPORTED_FEATURES
XEN_ELFNOTE_PHYS32_ENTRY
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The definitions of the different types can be found in the
&lt;a href=&quot;https://gitlab.com/JanMa/HermitCore/blob/xen_parav/include/xen/elfnote.h&quot;&gt;&lt;strong&gt;elfnotes.h&lt;/strong&gt;&lt;/a&gt; 
header-file. Mini-OS implements a macro to add ELF notes in the following way.&lt;/p&gt;

&lt;pre&gt;&lt;code class=&quot;language-assembly_x86&quot;&gt;    #define ELFNOTE(name, type, desc)           \
    .pushsection .note.name               ; \
    .align 4                              ; \
    .long 2f - 1f       /* namesz */      ; \
    .long 4f - 3f       /* descsz */      ; \
    .long type          /* type   */      ; \
1:.asciz #name          /* name   */      ; \
2:.align 4                                ; \
3:desc                  /* desc   */      ; \
4:.align 4                                ; \
    .popsection
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;It can be called like this:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;ELFNOTE(Xen, XEN_ELFNOTE_GUEST_OS, .asciz &quot;Mini-OS-x86_64&quot;)
ELFNOTE(Xen, XEN_ELFNOTE_LOADER, .asciz &quot;generic&quot;)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Unfortunately, Mini-OS compiles it’s assembly files with &lt;a href=&quot;https://gcc.gnu.org/&quot;&gt;&lt;strong&gt;gcc&lt;/strong&gt;&lt;/a&gt;,
which uses the AT&amp;amp;T assembly syntax. HermitCore on the other hand, uses
&lt;a href=&quot;https://www.nasm.us/doc/nasmdoc1.html#section-1.1&quot;&gt;&lt;strong&gt;nasm&lt;/strong&gt;&lt;/a&gt; which uses the Intel assembly 
syntax. Since they are not compatible the &lt;strong&gt;ELFNOTE&lt;/strong&gt; macro needs to be rewritten. A very
helpful comparison of the two styles can be found &lt;a href=&quot;https://www.ibm.com/developerworks/library/l-gas-nasm/&quot;&gt;here&lt;/a&gt;.
Without going into detail, this is an equivalent implementation in Intel assembly syntax for HermitCore:&lt;/p&gt;

&lt;pre&gt;&lt;code class=&quot;language-assembly_x86&quot;&gt;%macro ELFNOTE 3 ; name, type, descr
    align 4
    dd %%2 - %%1
    dd %%4 - %%3
    dd %2
  %%1:
    dd %1
  %%2:
    align 4
  %%3:
    dd %3
  %%4:
    align 4
%endmacro
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Which can be called like this:&lt;/p&gt;

&lt;pre&gt;&lt;code class=&quot;language-assembly_x86&quot;&gt;SECTION .note
elf_notes:
  ELFNOTE &quot;Xen&quot;,XEN_ELFNOTE_GUEST_OS,&quot;HermitCore&quot;
  ELFNOTE &quot;Xen&quot;,XEN_ELFNOTE_GUEST_VERSION,&quot;0.2.5&quot;
  ELFNOTE &quot;Xen&quot;,XEN_ELFNOTE_LOADER,&quot;generic&quot;
  ELFNOTE &quot;Xen&quot;,XEN_ELFNOTE_XEN_VERSION,&quot;xen-3.0&quot;
  ELFNOTE &quot;Xen&quot;,XEN_ELFNOTE_HYPERCALL_PAGE,hypercall_page
  ELFNOTE &quot;Xen&quot;,XEN_ELFNOTE_ENTRY,_start
  ELFNOTE &quot;Xen&quot;,XEN_ELFNOTE_FEATURES,0x3  
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;With these notes added to the kernels &lt;a href=&quot;https://gitlab.com/JanMa/HermitCore/blob/xen_parav/arch/x86/kernel/entry.asm&quot;&gt;&lt;strong&gt;entry.asm&lt;/strong&gt;&lt;/a&gt; file, the Xen domain builder is able to detect the binary 
and tries to boot it. At this point the domain still dies almost instantly, 
so additional changes to the assembly boot code have to be made.&lt;/p&gt;

&lt;h3 id=&quot;modifying-the-assembly-boot-code&quot;&gt;Modifying the assembly boot code&lt;/h3&gt;

&lt;p&gt;The initial boot time environment of a Xen PV guest is different from
the normal initial mode of an x86 processor. Instead of starting with
paging disabled in 16-bit mode, a PV guest is started in either 32 or 64
bit mode with paging enabled and runs on a first set of page tables
provided by the hypervisor. These pages are set up to correspond to the
required invariants and are loaded into the base register of the page
table, but are not explicitly pinned.&lt;/p&gt;

&lt;p&gt;The initial virtual and pseudo physical memory layout is described in the &lt;a href=&quot;https://gitlab.com/JanMa/HermitCore/blob/xen_parav/include/xen/xen.h&quot;&gt;&lt;strong&gt;xen.h&lt;/strong&gt;&lt;/a&gt; header file.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;1. The domain is started within contiguous virtual-memory region
2. The contiguous region ends on an aligned 4MB boundary.
3. This the order of bootstrap elements in the initial 
   virtual region:
    a. relocated kernel image
    b. initial ram disk              [mod_start, mod_len]
       (may be omitted)
    c. list of allocated page frames [mfn_list, nr_pages]
       (unless relocated due to XEN_ELFNOTE_INIT_P2M)
    d. start_info_t structure        [register rSI (x86)]
       in case of dom0 this page contains the console info, too
    e. unless dom0: xenstore ring page
    f. unless dom0: console ring page
    g. bootstrap page tables         [pt_base and CR3 (x86)]
    h. bootstrap stack               [register ESP (x86)]
4. Bootstrap elements are packed together, but each is 
   4kB-aligned.
5. The list of page frames forms a contiguous 'pseudo-physical' 
   memory layout for the domain. In particular, the bootstrap 
   virtual-memory region is a 1:1 mapping to the first section 
   of the pseudo-physical map.
6. All bootstrap elements are mapped read-writable for the guest 
   OS. The only exception is the bootstrap page table, 
   which is mapped read-only.
7. There is guaranteed to be at least 512kB padding after the 
   final bootstrap element. If necessary, the bootstrap 
   virtual region is extended by an extra 4MB to ensure this.
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;To jump from the assembly boot code to the actual kernel &lt;strong&gt;C&lt;/strong&gt; code,
there are only a few things needed. When the guest is launched as
explained above, the &lt;strong&gt;ESI&lt;/strong&gt; or &lt;strong&gt;RSI&lt;/strong&gt; register (depending on wether it
is a 32 or 64 bit guest) contains a &lt;strong&gt;start_info_t&lt;/strong&gt; structure which
is needed later on and needs to be saved. Other than that, only a stack
has to be set up. This simplifies the boot code a lot to the just few
following lines:&lt;/p&gt;

&lt;pre&gt;&lt;code class=&quot;language-assembly_x86&quot;&gt;SECTION .mboot
global _start
_start:
jmp start64

...

SECTION .ktext
align 4
start64:
cld                                 ; clear registers
add rsp, KERNEL_STACK_SIZE-16       ; set up stack
mov rdi, rsi                        ; pass start_info_t as 
extern hermit_main                  ; argument to hermit_main
call hermit_main                    ; jump into C-code
jmp $
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;After the above code ran, the instruction pointer is pointing at the
&lt;strong&gt;hermit_main&lt;/strong&gt; function defined in the kernels &lt;a href=&quot;https://gitlab.com/JanMa/HermitCore/blob/xen_parav/kernel/main.c&quot;&gt;&lt;strong&gt;main.c&lt;/strong&gt;&lt;/a&gt; file.&lt;/p&gt;

&lt;h3 id=&quot;initializing-xen-specific-features&quot;&gt;Initializing XEN specific features&lt;/h3&gt;

&lt;p&gt;The next things the guest has to set up are some XEN specific features.
Upon starting, a PV guest gets passed a &lt;strong&gt;start_info_t&lt;/strong&gt; structure
which contains many important information for the operating system.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/start_info_structure.png&quot; alt=&quot;start_info_t structure&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Particularly important are the &lt;strong&gt;nr_pages&lt;/strong&gt;, &lt;strong&gt;shared_info&lt;/strong&gt; and
&lt;strong&gt;domU&lt;/strong&gt; entries. They are needed for determining the assigned RAM,
communication with the Xen hypervisor and console output. The assembly
boot code passes the virtual address of the structure to the
&lt;strong&gt;hermit_main&lt;/strong&gt; function as the first argument.&lt;/p&gt;

&lt;h4 id=&quot;hypercalls&quot;&gt;Hypercalls&lt;/h4&gt;

&lt;p&gt;The environment presented to a Xen PV guest is not quite the same as
that of a real x86 system. From the perspective of the operating system,
the biggest difference is that it is running in &lt;strong&gt;ring 1&lt;/strong&gt; or &lt;strong&gt;ring 3&lt;/strong&gt;
instead of &lt;strong&gt;ring 0&lt;/strong&gt;. This means that it cannot perform any privileged
instructions. In order to provide similar functionality, the hypervisor
exposes a set of &lt;strong&gt;hypercalls&lt;/strong&gt; that correspond to the instructions. A
hypercall is conceptually similar to a &lt;strong&gt;system call&lt;/strong&gt;. To request
a service from the hypervisor, the guest calls a function in a shared
memory page which gets mapped by the hypervisor.&lt;/p&gt;

&lt;p&gt;First, a special page must be created in the assembly boot code.&lt;/p&gt;

&lt;pre&gt;&lt;code class=&quot;language-assembly_x86&quot;&gt;global shared_info, hypercall_page
ALIGN 4096
shared_info:
    times 512 DQ 0

hypercall_page:
    times 512 DQ 0    
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Then the virtual address of the &lt;strong&gt;hypercall_page&lt;/strong&gt; gets passed to the
hypervisor in an ELF note&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;ELFNOTE &quot;Xen&quot;,XEN_ELFNOTE_HYPERCALL_PAGE,hypercall_page
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Hypercalls are issued by calling an address within this page. The
following listing shows a &lt;a href=&quot;https://gitlab.com/JanMa/HermitCore/blob/xen_parav/arch/x86/include/x86_64/hypercall-x86_64.h&quot;&gt;macro&lt;/a&gt; that is used to call a hypercall without
additional arguments.&lt;/p&gt;

&lt;pre&gt;&lt;code class=&quot;language-c_cpp&quot;&gt;extern char hypercall_page[4096];
#define _hypercall0(type, name)			                 \
({						                                 \
long __res;				                                 \
asm volatile (				                             \
&quot;call hypercall_page + (&quot;STR(__HYPERVISOR_##name)&quot; * 32)&quot;\
: &quot;=a&quot; (__res)			                                 \
:				                                         \
: &quot;memory&quot; );			                                 \
(type)__res;				                             \
})
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;A list of all hypercalls the guest can use, can also be found in the &lt;a href=&quot;https://gitlab.com/JanMa/HermitCore/blob/xen_parav/include/xen/xen.h&quot;&gt;&lt;strong&gt;xen.h&lt;/strong&gt;&lt;/a&gt; header file. Upon
usage, the individual hypercalls will be explained in more detail.&lt;/p&gt;

&lt;h4 id=&quot;shared-info-page&quot;&gt;Shared Info page&lt;/h4&gt;

&lt;p&gt;&lt;img src=&quot;/assets/shared_info_structure.png&quot; alt=&quot;shared_info_t structure&quot; /&gt;&lt;/p&gt;

&lt;p&gt;One of the first things a guest has to do is to set up the
&lt;strong&gt;shared_info&lt;/strong&gt; page. It contains a &lt;strong&gt;shared_info_t&lt;/strong&gt; struct which
holds valuable information about the virtual CPUs assigned to the guest,
the system time and event channels which can be used to communicate with
other domains. It is again defined in the &lt;strong&gt;xen.h&lt;/strong&gt; header file. The machine address of the
&lt;strong&gt;shared_info&lt;/strong&gt; page is defined in the &lt;strong&gt;start_info_t&lt;/strong&gt; struct. In
order to map it into the virtual address space, the guest has to issue
the &lt;strong&gt;HYPERVISOR_update_va_mapping&lt;/strong&gt; hypercall.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;HYPERVISOR_update_va_mapping(
    (unsigned long) &amp;amp;shared_info,            // defined in entry.asm
    (unsigned long) start_info-&amp;gt;shared_info, // passed by Xen
    UVMF_INVLPG)                             // invalidate TLB entry 
);
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;After this is completed, the guest is able to use a pointer to the
&lt;strong&gt;shared_info&lt;/strong&gt; page as it would any other data structure.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;shared_info_t *HYPERVISOR_shared_info = (shared_info_t*) &amp;amp;shared_info;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 id=&quot;time-keeping&quot;&gt;Time keeping&lt;/h3&gt;

&lt;p&gt;The default way of HermitCore to determine system time, is to simply get
the elapsed clock ticks since boot time. This is done by receiving
interrupts from the PIT or APIC timer and counting them.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt; /** @brief Get milliseconds since system boot */
static inline uint64_t get_uptime() 
{ 
    return (get_clock_tick() * 1000) / TIMER_FREQ; 
}

/** @brief Returns the current number of ticks. */
static inline uint64_t get_clock_tick(void)
{
    return per_core(timer_ticks);
}

 /* Handles the timer. In this case, it's very simple: We
  * increment the 'timer_ticks' variable every time the
  * timer fires. */
static void timer_handler(struct state *s)
{
	/* Increment our 'tick counter' */
	set_per_core(timer_ticks, per_core(timer_ticks)+1);
}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The number of tics is divided by the timer frequency and multiplied by
1000 to get the elapsed milliseconds.&lt;/p&gt;

&lt;p&gt;When running as a PV guest in Xen, the operating system does not have
access to a PIT or APIC timer and thus the default time keeping method
will not work. Xen provides the guest with all the information necessary
to keep track of time through the &lt;strong&gt;shared_info_t&lt;/strong&gt; struct.&lt;/p&gt;

&lt;p&gt;In general, there are two types of time that a Xen guest must keep in
mind. The first is the wall clock time - the elapsed real time. It is
used for userspace applications that perform scheduled tasks, display
clocks, and so on. The second is virtual time - the time the guest has
spent executing. Virtual time is essential for scheduling tasks that are
performed within a domain.&lt;/p&gt;

&lt;p&gt;While a guest is scheduled, he receives a periodic tic event every 10
ms. This allows him to easily keep an eye on the virtual time. Real-time
values are somewhat more complicated. Three different time values are
needed to track real time:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;Initial system time&lt;br /&gt;
is the time of day when system time is zero.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Current system time&lt;br /&gt;
is the time that has elapsed since the guest was resumed and is
updated whenever the guest is scheduled.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;TSC time&lt;br /&gt;
is the number of cycles that have elapsed since an arbitrary point
in the past.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To calculate the uptime, i implemented a &lt;strong&gt;gettimeofday()&lt;/strong&gt; function 
using the provided information.&lt;/p&gt;

&lt;h4 id=&quot;gettimeofday&quot;&gt;gettimeofday()&lt;/h4&gt;

&lt;p&gt;Implementing the &lt;strong&gt;gettimeofday()&lt;/strong&gt; function requires access to the
shared_info page, the TSC and some simple calculations. The
&lt;strong&gt;shared_info_t&lt;/strong&gt; struct contains time values which are regularly
updated by Xen.&lt;/p&gt;

&lt;p&gt;To calculate the time, the guest has to wait until the last bits of the
&lt;strong&gt;wc_version&lt;/strong&gt; and &lt;strong&gt;version&lt;/strong&gt; variables equal zero. This indicates the
time values are not being updated and are save to read. Then, the
current system time can be calculated with the following formula:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;current system time = system_time + ((((tsc - tsc_timestamp) &amp;lt;&amp;lt; tsc_shift) * tsc_to_system_mul) &amp;gt;&amp;gt; 32)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;To get the wall clock time, the calculated system time has to be added
to the &lt;strong&gt;wc_sec&lt;/strong&gt; and &lt;strong&gt;wc_nsec&lt;/strong&gt; values.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;nanoseconds = wc_nsec + current system time
seconds = wc_sec + (nanoseconds ÷ 1.000.000.000)
nanoseconds = nanoseconds mod 1.000.000.000
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;seconds&lt;/strong&gt; now contains the elapsed seconds since the &lt;a href=&quot;https://en.wikipedia.org/wiki/Unix_time&quot;&gt;Epoch&lt;/a&gt;. A
complete implementation of the &lt;strong&gt;gettimeofday()&lt;/strong&gt; function can be found
in the file &lt;a href=&quot;https://gitlab.com/JanMa/HermitCore/blob/xen_parav/arch/x86/kernel/time.c&quot;&gt;&lt;strong&gt;time.c&lt;/strong&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id=&quot;events&quot;&gt;Events&lt;/h3&gt;

&lt;p&gt;Event channels are the basic element that Xen provides for event
notifications. An event is the Xen equivalent of a hardware interrupt.
They essentially store one bit of information, the event of interest is
signalled by switching that bit from 0 to 1. Notifications are received
from a guest via an upcall from Xen that indicates when an event occurs
(by setting the bit). Further notifications are masked until the bit is
deleted again. Therefore guests must check the value of the bit after
re-enabling the delivery of events to ensure that no missed
notifications are received. Event notifications can be masked by setting
a flag. This is equivalent to disabling interrupts and can be used to
ensure the atomicity of certain operations in the guest kernel.&lt;/p&gt;

&lt;p&gt;All event notifications are received by the same handler. The guest has
to set up a way to dispatch events to their correct handlers when they
are received. Since events are delivered completely asynchronously (much
like normal hardware interrupts) they can occur at any point in
execution. Upon entering the event handler it is therefore necessary for
the guest to save the current state. When exiting an interrupt handler
on x86, it is common to use the &lt;strong&gt;IRET&lt;/strong&gt; instruction. This restores
control to the interrupted process an re-enables interrupts atomically.
Since events are an entirely software construct, the &lt;strong&gt;IRET&lt;/strong&gt;
instruction has no way of knowing how to enable them . There are two
solutions for this. Xen provides an &lt;strong&gt;IRET&lt;/strong&gt; hypercall that re-enables
event delivery via the hypervisor. The other way it to not re-enable
them atomically and handle errors when something goes wrong.&lt;/p&gt;

&lt;p&gt;The actual implementation of the event handling has been taken in
large parts from the examples provied in David Chisnall’s book 
“&lt;a href=&quot;https://www.informit.com/store/definitive-guide-to-the-xen-hypervisor-9780132349710&quot;&gt;The Definitive Guide to the Xen Hypervisor&lt;/a&gt;”. It is rather complex and involves a lot of jumping
from assembly code to C code and back. Explaining it in detail would go
beyond the scope of this chapter. The code can be found in the
&lt;strong&gt;entry.asm&lt;/strong&gt; and &lt;a href=&quot;https://gitlab.com/JanMa/HermitCore/blob/xen_parav/kernel/event.c&quot;&gt;&lt;strong&gt;event.c&lt;/strong&gt;&lt;/a&gt; files. 
For further information an on how this works exactly, i would advise you to read
chapter 7 in the mentioned book.&lt;/p&gt;

&lt;h3 id=&quot;console-output&quot;&gt;Console output&lt;/h3&gt;

&lt;p&gt;Xen provides the user with the possibility to connect a virtual console
to a running guest. This enables the user to read the boot output of the
guest and interact with it in the terminal in a way that is very similar
to connecting to another computer via &lt;strong&gt;ssh&lt;/strong&gt;. The console can either be
attached when starting the guest by adding the &lt;strong&gt;“-c”&lt;/strong&gt; flag&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;xl create -c domain_config
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;or later on by issuing the following command:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;xl console &amp;lt;Domain&amp;gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Inside the guest, the console is also implemented using a shared memory
page. The &lt;strong&gt;start_info_t&lt;/strong&gt; struct shown in Figure 3.2 contains the
machine page number of the console page and the event channel that is
used for communication. To initialize the console, the guest first has
to translate the machine page number of the console page into a physical
page number. To avoid confusion, it is helpful to clarify the different
types of memory addresses:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;machine address&lt;br /&gt;
address in the (real) machine’s address space running Xen&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;physical address&lt;br /&gt;
address in the (virtual) guest machine’s address space&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;virtual address&lt;br /&gt;
virtual address inside the guest&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Translating a machine page number into a physical page number is done
with the help of the &lt;strong&gt;machine_to_phys_mapping&lt;/strong&gt; macro defined in the
&lt;a href=&quot;https://gitlab.com/JanMa/HermitCore/blob/xen_parav/include/xen/arch-x86/xen-x86_64.h&quot;&gt;&lt;strong&gt;xen-x86_64.h&lt;/strong&gt;&lt;/a&gt;
header file.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;#define machine_to_phys_mapping ((unsigned long *)HYPERVISOR_VIRT_START)    
#define HYPERVISOR_VIRT_START xen_mk_ulong(__HYPERVISOR_VIRT_START)
#define __HYPERVISOR_VIRT_START 0xFFFF800000000000
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The shared console page contains a &lt;strong&gt;xencons_interface&lt;/strong&gt; struct which
is defined in the &lt;a href=&quot;https://gitlab.com/JanMa/HermitCore/blob/xen_parav/include/xen/io/console.h&quot;&gt;&lt;strong&gt;console.h&lt;/strong&gt;&lt;/a&gt; header file.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;struct xencons_interface {
    char in[1024];
    char out[2048];
    XENCONS_RING_IDX in_cons, in_prod;
    XENCONS_RING_IDX out_cons, out_prod;
}; 
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;It consists of two ring buffers, one for input and one for output. To
write output to the console, the guest essentially only has to write
data into the output ring buffer. The control domain (dom0) then reads
the content of the buffer and outputs it into the terminal. The complete
code for Initializing the console is show in the listing below:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;int console_init(start_info_t * start)
{
    console = (struct xencons_interface*)
        ((machine_to_phys_mapping[start-&amp;gt;console.domU.mfn] &amp;lt;&amp;lt; 12));
    console_evt = start-&amp;gt;console.domU.evtchn;
    return 0;
}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Writing a string to the console works in the following way:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;int console_write(char * message)
{
    struct evtchn_send event;
    event.port = console_evt;
    int length = 0;
    while(*message != '\0')
    {
        /* Wait for the back end to clear enough space in the buffer */
        XENCONS_RING_IDX data;
        do
        {
            data = console-&amp;gt;out_prod - console-&amp;gt;out_cons;
            HYPERVISOR_event_channel_op(EVTCHNOP_send, &amp;amp;event);
            mb();
        } while (data &amp;gt;= sizeof(console-&amp;gt;out));
        /* Copy the byte */
        int ring_index = MASK_XENCONS_IDX(console-&amp;gt;out_prod, console-&amp;gt;out);
        console-&amp;gt;out[ring_index] = *message;
        /* Ensure that the data really is in the ring before continuing */
        wmb();
        /* Increment input and output pointers */
        console-&amp;gt;out_prod++;
        length++;
        message++;
    }
    HYPERVISOR_event_channel_op(EVTCHNOP_send, &amp;amp;event);
    return length;
}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Once the guest has written data into the output buffer, it sends an
event channel notification to the dom0 with the help of the
&lt;strong&gt;HYPERVISOR_event_channel_op&lt;/strong&gt; hypercall. For a more convenient
usage, i also implemented a &lt;strong&gt;printf&lt;/strong&gt;-like function. The complete
implementation can be found in the &lt;a href=&quot;https://gitlab.com/JanMa/HermitCore/blob/xen_parav/kernel/console.c&quot;&gt;&lt;strong&gt;console.c&lt;/strong&gt;&lt;/a&gt; file.&lt;/p&gt;

&lt;h3 id=&quot;interrupts&quot;&gt;Interrupts&lt;/h3&gt;

&lt;p&gt;In addition to events, Xen provides a lower-level form of asynchronous
notification in the form of traps. Unlike events that can be dynamically
generated and bound, traps have a static meaning that corresponds
directly to hardware interrupts. When the guest is started on a physical
CPU, Xen installs an Interrupt Descriptor Table (&lt;a href=&quot;https://en.wikipedia.org/wiki/Interrupt_descriptor_table&quot;&gt;IDT&lt;/a&gt;)
in the guests name. Since traps correspond directly to hardware interrupts, the
same code can be used to handle them.&lt;/p&gt;

&lt;p&gt;To install an IDT, the guest has to use the
&lt;strong&gt;HYPERVISOR_set_trap_table&lt;/strong&gt; hypercall. It accepts an array of
&lt;strong&gt;trap_info_t&lt;/strong&gt; structs which contains one entry for every interrupt.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;struct trap_info {
    uint8_t       vector;  /* exception vector      */
    uint8_t       flags;   /* 0-3: privilege level  */
    uint16_t      cs;      /* code selector         */
    unsigned long address; /* code offset           */
};
typedef struct trap_info trap_info_t;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Each entry contains the number of the interrupt, the highest privilege
ring that can raise the interrupt and the address of the handler. All of
HermitCores functions relating to creating the IDT have been rewritten
to fill out such an array.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;/* We have defined exactly 66 interrupts */
static trap_info_t idt[67] = {[0 ... 66] = {0, 0, 0, 0}};
uint8_t trap_counter = 0;

static void configure_idt_entry(trap_info_t *dest_entry,uint8_t num, size_t base,t16_t sel, uint8_t flags)
{
    /* The interrupt routine's base address */
    dest_entry-&amp;gt;address = base;
    dest_entry-&amp;gt;vector = num;
    /* The segment or 'selector' that this IDT entry will use
     *  is set here, along with any access flags */
    dest_entry-&amp;gt;cs = sel;
    dest_entry-&amp;gt;flags = flags;
}

void idt_set_gate(uint8_t num, size_t base, uint16_t sel, uint8_t flags)
{
    configure_idt_entry(&amp;amp;idt[trap_counter++], num, base, sel, flags);
}    
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;To configure an IDT entry, the &lt;strong&gt;idt_set_gate&lt;/strong&gt; function can be called
like this:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;idt_set_gate(0, (size_t)isr0, FLAT_KERNEL_CS, 0);
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;FLAT_KERNEL_CS&lt;/strong&gt; is a flag defined by Xen. It represents the code
segment created by Xen mirroring a flat address space, where the entire
space is mapped into a single segment.&lt;/p&gt;

&lt;p&gt;When HermitCore has finished filling out the IDT entries, it calls the
&lt;strong&gt;idt_install&lt;/strong&gt; function.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;void idt_install(void)
{
    /* Issue a hypercall to install the new idt */
    int ret = HYPERVISOR_set_trap_table(idt);
    if (ret) {
        LOG_INFO(&quot;Failed to set Trap Table!\n&quot;);
        LOG_INFO(&quot;Error: %d\n&quot;, ret);
        asm volatile (&quot;hlt&quot;);
    }
    LOG_INFO(&quot;Installed new IDT with %d entries.\n&quot;, trap_counter);
}    
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This issues the &lt;strong&gt;HYPERVISOR_set_trap_table&lt;/strong&gt; hypercall, which
instructs Xen to install the new IDT on behalf of the guest after it has
been validated.&lt;/p&gt;

&lt;h3 id=&quot;memory-management&quot;&gt;Memory Management&lt;/h3&gt;

&lt;p&gt;One of the original innovations of the Xen hypervisor was the
paravirtualized Memory Management Unit (&lt;a href=&quot;https://en.wikipedia.org/wiki/Memory_management_unit&quot;&gt;MMU&lt;/a&gt;),
which enabled fast and efficient virtualization of guests using paging. To virtualize the
memory subsystem, all hypervisors require an additional level of
abstraction between what a guest sees as physical memory and the memory
of the machine running the hypervisor. This is usually done via Physical
to Machine (P2M) mapping and is managed by the hypervisor, i.e. it is
hidden from the guest’s operating system. Instead, the Xen MMU model
requires the guest to know of the P2M mapping. The guest’s operating
system must be modified so that instead of writing page table updates to
the physical address, they must be written to the machine address. To
ensure that the guest cannot access memory areas he should not have
access to, Xen requires that all page table updates be performed by the
hypervisor. This means that the guest has read access to all page tables
and must issue hypercalls when updating.&lt;/p&gt;

&lt;p&gt;Modifying HermitCore’s memory management to work with the invariants of
Xen would require to rewrite large parts of it. Considering that
developing the memory management code has been a bachelor thesis on it’s
own, implementing these changes would definitively go beyond the scope
of my thesis. Instead, i modified HermitCore to run as a PVH guest
on Xen.&lt;/p&gt;

&lt;h3 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h3&gt;

&lt;p&gt;Modifying HermitCore to work as a purely paravirtualized guest on Xen
took a lot of changes. The lack of access to emulated hardware
components such an an PIT or APIC and the specific constraints made by
Xen make it necessary to rewrite many basic operating system
functionalities. Considering that this would include modifying large
parts of the memory management code, i changed the focus of the implementation. 
Instead of running HermitCore as a purely paravirtualized
guest, i implemented the necessary changes to make it work as a
PVH guest.&lt;/p&gt;

&lt;p&gt;You will see in my next post, that the ability of a PVH guest to manage and modify it’s own
page tables makes things a lot easier. Instead of rewriting large parts of the memory management
code, i could just use the existing one without modifications. I hope you will be back for the &lt;a href=&quot;/2018-12-09/os-xen-004/&quot;&gt;next part&lt;/a&gt;,&lt;/p&gt;

&lt;p&gt;Jan&lt;/p&gt;

&lt;h3 id=&quot;further-reading&quot;&gt;Further reading&lt;/h3&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.netbsd.org/docs/kernel/elf-notes.html&quot;&gt;ELF notes specification&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://wiki.xen.org/wiki/X86_Paravirtualised_Memory_Management&quot;&gt;Xen paravirtualized memory management&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;http://www.noteblok.net/wp-content/uploads/sites/3/2015/01/Self-referenced_Page_Tables-Vogel-ASPLOS_SrC.pdf&quot;&gt;Self-referencing page tables for the x86 architecture&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;http://www.informit.com/store/definitive-guide-to-the-xen-hypervisor-9780132349710&quot;&gt;The Definitive Guide to the Xen Hypervisor&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
        <pubDate>Sun, 25 Nov 2018 00:00:00 +0000</pubDate>
        <link>/2018-11-25/os-xen-003/</link>
        <guid isPermaLink="true">/2018-11-25/os-xen-003/</guid>
        
        <category>unikernel</category>
        
        <category>xen</category>
        
        <category>os</category>
        
        <category>hermitcore</category>
        
        
      </item>
    
      <item>
        <title>Porting an unikernel to Xen: HVM guest</title>
        <description>&lt;p&gt;Welcome back to the second part of my blog post series on how to port an unikernel to Xen. In the &lt;a href=&quot;/2018-11-11/os-xen-001/&quot;&gt;first part&lt;/a&gt; i gave an introduction to Xen, Unikernels and HermitCore and explained the conceptual changes that have to be made to an operating system to work as a paravirtualized guest in Xen. This part will show you how i got HermitCore running as a fully virtualized guest in Xen. All the sourcecode i show in this post can be found in the &lt;a href=&quot;https://gitlab.com/JanMa/HermitCore/tree/xen_hvm&quot;&gt;&lt;strong&gt;xen_hvm&lt;/strong&gt;&lt;/a&gt; branch of my HermitCore fork on GitLab. If you want to try it out for yourself, the &lt;strong&gt;docker&lt;/strong&gt; folder contains instructions on how to build HermitCore and the &lt;strong&gt;vagrant&lt;/strong&gt; folder includes a fully configured Xen hypervisor inside a Vagrant box.&lt;/p&gt;

&lt;h3 id=&quot;table-of-contents&quot;&gt;Table of contents&lt;/h3&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;#create-a-bootable-grub-iso-image&quot;&gt;Create a bootable image&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#hvm-domain-configuration&quot;&gt;HVM domain configuration&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#create-a-wrapper-script&quot;&gt;Startup wrapper script&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#conclusion&quot;&gt;Conclusion&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Xen includes two main modes of operation. One is running paravirtualized guests and the other is hardware assisted virtualization. CPUs that support hardware virtualization make it possible to run unmodified guests, including operation systems such as Microsoft Windows. In Xen this is called a hardware virtual machine (HVM). Xen uses device emulation based on &lt;a href=&quot;https://www.qemu.org/&quot;&gt;QEMU&lt;/a&gt; to provide I/O virtualization to the virtual machines. This means the virtual machines see an emulated version of a fairly basic PC.&lt;/p&gt;

&lt;p&gt;Because the virtualization of HVM guests is based on QEMU, it is possible to run HermitCore almost unmodified as a guest in Xen. When booting HermitCore via QEMU directly, the kernel is booted via the GRUB &lt;a href=&quot;https://www.gnu.org/software/grub/manual/multiboot/multiboot.html&quot;&gt;Multiboot&lt;/a&gt; protocol. Unfortunately it was not possible to make the Multiboot Binary bootloader, which is included in Xen, boot HermitCore directly.&lt;/p&gt;

&lt;p&gt;A simple other way to boot HermitCore, is to first start the &lt;a href=&quot;https://www.gnu.org/software/grub/&quot;&gt;GRUB&lt;/a&gt; bootloader and then boot into HermitCore.&lt;/p&gt;

&lt;h2 id=&quot;create-a-bootable-grub-iso-image&quot;&gt;Create a bootable GRUB ISO image&lt;/h2&gt;

&lt;p&gt;The easiest way to create a bootable media is to use the &lt;a href=&quot;https://www.gnu.org/software/grub/manual/grub/html_node/Invoking-grub_002dmkrescue.html&quot;&gt;&lt;strong&gt;grub-mkrescue&lt;/strong&gt;&lt;/a&gt; tool. It is included in every GRUB installation on major Linux distributions and can be used to create a bootable ISO file containing the GRUB bootloader. GRUB modules are also needed. On Ubuntu for example, they are included in the package &lt;strong&gt;grub-pc-bin&lt;/strong&gt;. To create a GRUB rescue disk, the folder structure of the ISO file has to be created first. It includes the HermitCore loader, the application that is supposed to run and a GRUB config file and it should look like this:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;iso/
 *-- boot
    |-- grub
    |   *-- grub.cfg
    |-- hermit_application
    *-- ldhermit.elf
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The &lt;strong&gt;grub.cfg&lt;/strong&gt; file has the following content:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;default=0
timeout=0

menuentry &quot;HermitCore&quot; {
    multiboot /boot/ldhermit.elf
    module /boot/hermit_application
    boot
}    
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;It instructs GRUB to create a menu entry called “HermitCore”, which boots the compiled binary file of the HermitCore loader via the Multiboot protocol. The actual application that is intended to run gets passed to the loader as a Multiboot-module. Additional command line options can be passed to the kernel after the &lt;strong&gt;multiboot&lt;/strong&gt; statement. For example&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;multiboot /boot/ldhermit.elf -uart=io:0x3f8
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;would enable output on the serial port COM1. To create the ISO file, &lt;strong&gt;grub-mkrescue&lt;/strong&gt; needs to be invoked in the following way:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;grub-mkrescue -o /tmp/hermit.iso iso/
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The folder structure shown above is included in the “hermit.iso” file. This ISO could now be booted via QEMU and it would work just fine. A domain configuration file is still needed to boot it via Xen.&lt;/p&gt;

&lt;h2 id=&quot;hvm-domain-configuration&quot;&gt;HVM domain configuration&lt;/h2&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;# This configures an HVM rather than PV guest
type = &quot;hvm&quot;

# Guest name
name = &quot;hermit-single.hvm&quot;

# Initial memory allocation (MB)
memory = 1024

# Number of VCPUS
vcpus = 1

# Network devices
vif = [ 'model=rtl8139,bridge=br0' ]

# Disk Devices
disk = [ 'file:/tmp/hermit.iso,hdc:cdrom,r' ]

#Disable VGA output
nographic = 1

#Serial Console Output
serial = [ 'file:/tmp/hermit.log' ]

tsc_mode=&quot;native&quot; 
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Xen will create a HVM guest by setting &lt;strong&gt;type=hvm&lt;/strong&gt;, assign it the given name, memory and one virtual CPU. HermitCore supports the &lt;strong&gt;rtl8139&lt;/strong&gt; chipset for network devices, so a device is added using this chipset and connected to the bridge &lt;strong&gt;br0&lt;/strong&gt; (The name of the device might be different on other Linux distributions). The just created ISO image is added as a CD-ROM and the graphic output is disabled. To see kernel boot messages, a serial port is attached and the output gets written into a temporary log file. While the guest is running, &lt;strong&gt;“tail -f”&lt;/strong&gt; can be used to watch it boot. The last line is important, since it tells Xen how to emulate the Time-Stamp Counter (&lt;a href=&quot;https://en.wikipedia.org/wiki/Time_Stamp_Counter&quot;&gt;TSC&lt;/a&gt;) of the guest. &lt;strong&gt;Native&lt;/strong&gt; mode has to be used, otherwise the time measures are wrong. A complete list of domain configuration options can be found in the &lt;a href=&quot;https://xenbits.xen.org/docs/unstable/man/xl.cfg.5.html&quot;&gt;&lt;strong&gt;xl.cfg&lt;/strong&gt;&lt;/a&gt; manual file. To start the domain, &lt;strong&gt;xl&lt;/strong&gt; has to be invoked with the following arguments, where &lt;strong&gt;domain_config&lt;/strong&gt; is the configuration file created above.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;xl create /path/to/domain_config
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 id=&quot;create-a-wrapper-script&quot;&gt;Create a wrapper script&lt;/h3&gt;

&lt;p&gt;The &lt;strong&gt;xen_hvm&lt;/strong&gt; branch on Gitlab includes a &lt;a href=&quot;https://gitlab.com/JanMa/HermitCore/blob/xen_hvm/tools/xen-single-kernel.sh&quot;&gt;wrapper script&lt;/a&gt;, which does all the steps shown above automatically. After building HermitCore it can be found in the following directory&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;HermitCore/build/local_prefix/opt/hermit/tools/xen-single-kernel.sh
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;and can be used as follows:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;Usage: xen-single-kernel.sh -hvlknmc args

This script starts a HermitCore application as a single kernel HVM Domain in Xen. Make sure to destroy your domain when you are finished!

Args:
  -l  Path to hermit loader
  -k  Path to hermit application
  -n  Only create files, do not start
  -m  Domain memory in MB (Default 1024)
  -c  Number of CPU cores (Default 1)
  -v  print VERSION
  -h  this help screen

Example:
  xen-single-kernel.sh -l bin/ldhermit.elf -k x86_64-hermit/extra/tests/hello -m 512
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;HermitCore is running very well as a HVM guest in Xen. The recorded output of booting the “hello world” application can be seen below.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://gitlab.com/JanMa/HermitCore/raw/970edfe70e86710cdc9021bc34d51035ab34fca4/doc/image/demo.gif&quot; alt=&quot;Hello world boot process&quot; /&gt;&lt;/p&gt;

&lt;p&gt;The only thing not working, is the ability to use multiple CPU cores. When trying to start additional CPU cores, HermitCore fails to bring them online via an Inter-processor Interrupt (IPI) which then leads to an infinite loop while waiting for these cores to come online. A possible way to make this work would be to implement hypercalls, which will be discussed in the following posts. With the help of a special hypercall, it is possible to make Xen start the additional CPU cores. Also, boot times are much slower compared to all other variants of running HermitCore. I will provide a detailed performance comparison in a future post.&lt;/p&gt;

&lt;p&gt;This is it for part two. I hope you found it interesting again and will be back for the &lt;a href=&quot;/2018-11-17/os-xen-002/&quot;&gt;next one&lt;/a&gt; where i will show you my process of trying to make HermitCore work as a completely paravirtualized guest in Xen.&lt;/p&gt;

&lt;p&gt;So long, Jan :-)&lt;/p&gt;
</description>
        <pubDate>Sat, 17 Nov 2018 00:00:00 +0000</pubDate>
        <link>/2018-11-17/os-xen-002/</link>
        <guid isPermaLink="true">/2018-11-17/os-xen-002/</guid>
        
        <category>unikernel</category>
        
        <category>xen</category>
        
        <category>os</category>
        
        <category>hermitcore</category>
        
        
      </item>
    
  </channel>
</rss>
