Adding pidfd support to CRIU: GSoC 2024
CRIU (Checkpoint/Restore in Userspace) allows you to freeze a running application and save it to disk (in the form of a set of image files).
We can restore (or restart) the application using these image files.
CRIU allows for container migration, snapshots, and the like. It is integrated into LXC, Docker, and Podman.
My project was implementing support for process file descriptors (pidfds) in CRIU.
What's a pidfd anyway?
Processes in Linux are idenitified using
PIDs (Process IDs).
A PID is just an integer (its maximum value can be found from:
/proc/sys/kernel/pid_max
).
When the number of processes exceeds
pid_max
, the kernel starts reassigning PIDs.
This is known as PID recycling.
This creates a problem for other processes that were using PIDs to point to the older process.
To provide a stable handle on a process (or a thread),
pidfds
were introduced.
You can create a
pidfd in two ways:
pidfd_open():
creates a pidfd for a specified pid
clone(3):
creates a pidfd for at process creation time.
Using the pidfd, you can send signals to the process using
pidfd_send_signal()
and obtain
a file descriptor from the process using
pidfd_getfd()
.
Implementation
No Code related to pidfd support has been merged yet.
Update on 3 October 2024: This PR has been merged!
CRIU already has a great number of abstractions for adding support of various file descriptors. Some portion of
development was to follow whatever patterns already existed for other file descriptors.
The two problems unique to pidfds were:
- Obtaining the correct PID: The pid of the process in the Pid field of
/proc/$pid/fdinfo/$pidfd
is incorrect in case checkpoint/restore
happens in a namespace. We end up using the last entry ofthe NSpid field giving the PID
in the most deeply nested PID namespace.
-
pidfds that point to dead processes: If we dump a pidfd that points to a dead process,
we do not have access to the PID it had pointed to. But, we do have to maintain equality between pidfds, i.e,
If two pidfds that pointed to the same dead process before C/R, they should continue to do so after C/R.
We solve this problem by creating a temporary process for each unique inode and open pidfds that point
to this temporary process. After all pidfds have been opened, we kill this temporary process.
Learnings
- How helpful
man-pages
are.
- Understanding kernel commit messages and a little bit of kernel code.
- I learned a lot of concepts related to containers such as Namespaces, cgroups, and containers work under the hood.
- Understood IPC (Interprocess communication), multithreading at a deeper level.
- GSoC was my first time working in a collaborative environment, and it gave me a sneak peek into professional development.
What's Left to Do
Support for PIDFD_THREAD
flag: PIDFD_THREAD allows for pidfds to be opened that point to a single
thread instead of a process. I will be working on this after Google Summer of Code.
Work Done After GSoC
-
Fixed a bug with
SIGCHLD
being called incorrectly on tmp process
.
-
Identified a bug with the current implementation of restore of dead pidfds.
-
How we handle dead pidfds has changed after GSoC. I contributed a test case to the change.
-
Added a page to CRIU docs explaining the working of checkpoint/restore of pidfds.
Acknowledgements
Huge thank you to my mentor,
Aleksandr Mikhalitsyn, for guiding me
and taking the extra time to explain concepts that helped me understand Linux and CRIU at a deeper level.
Thank you to
Andrei Vagin and
Radostin Stoyanov for reviewing my pull request alongside Alex.