Run a Firecracker on Nomad

Nomad is an orchestration system like Kubernetes maintained by HashiCorp. It’s designed with the simplicity in mind and this article sums it up nicely: https://www.nomadproject.io/docs/nomad-vs-kubernetes. What’s important is it has a plugin for Firecraker VMMs, and I’m going give it a try today.

Install Nomad locally:

$ wget https://releases.hashicorp.com/nomad/1.0.4/nomad_1.0.4_linux_amd64.zip
$ unzip nomad_1.0.4_linux_amd64.zip
$ sudo cp nomad /usr/local/bin
$ nomad --version
Nomad v1.0.4 (9294f35f9aa8dbb4acb6e85fa88e3e2534a3e41a)

Install a Firecracker plugin:

$ go get github.com/cneira/firecracker-task-driver
$ mkdir plugins
$ cp ~/go/bin/firecracker-task-driver plugins/

Install CNI plugins. CNI stands for Container Networking Interface and it creates a generic networking solution for containers.

$ git clone https://github.com/containernetworking/plugins.git cni-plugins
$ cd cni-plugins
$ ./build_linux.sh
$ sudo mkdir -p /opt/cni/bin
$ sudo cp bin/* /opt/cni/bin/

Install tap redirect plugin:

$ git clone https://github.com/awslabs/tc-redirect-tap
$ make
$ sudo cp tc-redirect-tap /opt/cni/bin/

Add a sample CNI configuration file to /etc/cni/conf.d/default.conflist. A filename must be matching an interface name with the extension .conflist:

{
  "name": "default",
  "cniVersion": "0.4.0",
  "plugins": [
    {
      "type": "ptp",
      "ipMasq": true,
      "ipam": {
	"type": "host-local",
	"subnet": "192.168.127.0/24",
	"resolvConf": "/etc/resolv.conf"
      }
    },
    {
      "type": "firewall"
    },
    {
      "type": "tc-redirect-tap"
    }
  ]
}

Prepare a Nomad config file:

plugins_dir = "<path-to-plugins>/plugins"
plugin "firecracker-task-driver" {}

Create a test job task01.conf:

job "example" {
  datacenters = ["dc1"]
  type = "service"

  task "test01" {
    driver = "firecracker-task-driver"

    config {
      KernelImage = "<path-to-vmlinux>/vmlinux"
      BootDisk    = "<path-to-rootfs>/rootfs.ext4"
      BootOptions = "console=ttyS0 noapic reboot=k panic=1 pci=off nomodules rw"
      Firecracker = "/usr/local/bin/firecracker"
      Vcpus       = 1
      Mem         = 128
      Network     = "default"
    }
  }
}

Run a Firecracker instance on Nomad:

$ nomad run task1.conf

$ nomad job status
ID            = example
Name          = example
Submit Date   = 2021-02-27T12:41:00-05:00
Type          = service
Priority      = 50
Datacenters   = dc1
Namespace     = default
Status        = running
Periodic      = false
Parameterized = false

Summary
Task Group  Queued  Starting  Running  Failed  Complete  Lost
test        0       0         0        0       2         0
test01      0       0         1        0       0         0

Latest Deployment
ID          = 909de29b
Status      = successful
Description = Deployment completed successfully

Deployed
Task Group  Desired  Placed  Healthy  Unhealthy  Progress Deadline
test01      1        1       1        0          2021-02-27T14:24:12-05:00

Allocations
ID        Node ID   Task Group  Version  Desired  Status    Created    Modified
50aa7ae4  d0ee8bf5  test01      4        run      running   12s ago    2s ago

$ nomad alloc status 50aa7ae4
ID                  = 50aa7ae4-e487-24bf-896b-01a310fa1eb8
Eval ID             = b2732817
Name                = example.test01[0]
Node ID             = d0ee8bf5
Node Name           = zoidberg
Job ID              = example
Job Version         = 4
Client Status       = running
Client Description  = Tasks are running
Desired Status      = run
Desired Description = <none>
Created             = 20s ago
Modified            = 10s ago
Deployment ID       = 909de29b
Deployment Health   = healthy

Task "test01" is "running"
Task Resources
CPU      Memory   Disk     Addresses
100 MHz  300 MiB  300 MiB  

Task Events:
Started At     = 2021-02-27T19:14:02Z
Finished At    = N/A
Total Restarts = 0
Last Restart   = N/A

Recent Events:
Time                       Type        Description
2021-02-27T14:14:02-05:00  Started     Task started by client
2021-02-27T14:14:02-05:00  Task Setup  Building Task Directory
2021-02-27T14:14:02-05:00  Received    Task received by client

$ ping 192.168.127.1
PING 192.168.127.1 (192.168.127.1) 56(84) bytes of data.
64 bytes from 192.168.127.1: icmp_seq=1 ttl=64 time=0.108 ms
64 bytes from 192.168.127.1: icmp_seq=2 ttl=64 time=0.102 ms

Stop the job:

$ nomad job stop example

Stopping the job does not remove the veth interface, you can run this one-liner to clean them up:

$ for veth in $(ifconfig | grep "^veth" | cut -d' ' -f1 | cut -d':' -f1); do sudo ip link set $veth down; done

I have to say that Nomad is a breath of fresh air especially after setting up a Kubernetes cluster. It does feel much simplier and easier to manage.