Docker: Checkpoint and Restore
Before I do a deeper dive into Firecracker, I wanted to run a quick experiment and check whether it’s possible to take a snapshot of a Docker container and restore it at some point. I’m far from being the first to think about this, and there’s a fantastic team working on CRIU: https://criu.org/. CRIU is a project that implements checkpoint/restore functionality for Linux. As the description says, it doesn’t work on Windows or Mac, so here I go again, dusting off my Thinkpad X220.
It’s been a while since I did something on Linux and Ubuntu, so I discovered
there’s a new package manager called “Snap”. I tried it with Docker, and
something messed up permissions on docker.sock
file, i.e. whenever I start a
Docker service, a sock file is always created under root:root
. Perhaps, I’ll
try snap next time, but for now I’m back to apt.
Install Docker
Install Docker by following instructions from https://docs.docker.com/engine/install/ubuntu/:
Add Docker’s GPG key:
$ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
Verify that you now have the key with the fingerprint
9DC8 5822 9FC7 DD38 854A E2D8 8D81 803C 0EBF CD88
, by searching for the last 8 characters of the fingerprint:$ sudo apt-key fingerprint 0EBFCD88 pub rsa4096 2017-02-22 [SCEA] 9DC8 5822 9FC7 DD38 854A E2D8 8D81 803C 0EBF CD88 uid [ unknown] Docker Release (CE deb) <docker@docker.com> sub rsa4096 2017-02-22 [S]
Setup the stable repository:
$ sudo add-apt-repository \ "deb [arch=amd64] https://download.docker.com/linux/ubuntu \ $(lsb_release -cs) \ stable"
Install Docker:
$ sudo apt-get install docker-ce docker-ce-cli containerd.io
Add your user to docker group:
sudo usermod -aG docker ${USER}
- Logout/Login to your session.
Validate that Docker runs:
$ docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
Install CRIU
Follow instruction from https://criu.org/Docker:
Install CRIU:
$ sudo add-apt-repository ppa:criu/ppa $ sudo apt install criu
Enable experimental features in Docker:
sudo echo "{\"experimental\": true}" >> /etc/docker/daemon.json sudo systemctl restart docker
Run a test container that increments a timer every second:
$ docker run -d --name looper --security-opt seccomp:unconfined busybox \ /bin/sh -c 'i=0; while true; do echo $i; i=$(expr $i + 1); sleep 1; done'
Verify that logs are showing up:
$ docker logs looper 0 1 2 3
Create a checkpoint, which will stop the container:
$ docker checkpoint create looper checkpoint1 checkpoint1
Restore the container:
$ docker start --checkpoint checkpoint1 looper
Check logs one more time:
$ docker logs looper 0 1 2 3 0 1 2
Something didn’t work well on the last step, as you may notice. Test container is starting the counter from 0 rather than from the checkpoint. Not sure why this is happening. I’m running Docker 20.10.3 and CRIU 3.15 on Linux 5.8.15, which satisfies CRIU requirements.
At the same time, if I try something different like run alpine container, create a folder and snapshot, it works like expected:
$ docker run -it alpine sh / # mkdir ~/hello
$ docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES a82ccad8fdfa alpine "sh" 2 minutes ago Up 1 second clever_kapitsa $ docker checkpoint create a82ccad8fdfa checkpoint1 checkpoint1 $ docker ps $ docker start --checkpoint checkpoint1 a82ccad8fdfa $ docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES a82ccad8fdfa alpine "sh" 2 minutes ago Up 1 second clever_kapitsa $ docker exec -it a82ccad8fdfa sh / # cd ~ # ls hello
I think CRIU will be useful at somepoint with Firecracker as well. Resource-wise in some cases it doesn’t make sense to run user containers at all times if no one is using them.