Adding pidfd support to CRIU: GSoC 2024

CRIU (Checkpoint/Restore in Userspace) allows you to freeze a running application and save it to disk (in the form of a set of image files). We can restore (or restart) the application using these image files. CRIU allows for container migration, snapshots, and the like. It is integrated into LXC, Docker, and Podman.
My project was implementing support for process file descriptors (pidfds) in CRIU.

What's a pidfd anyway?

Processes in Linux are idenitified using PIDs (Process IDs). A PID is just an integer (its maximum value can be found from: /proc/sys/kernel/pid_max). When the number of processes exceeds pid_max, the kernel starts reassigning PIDs. This is known as PID recycling.
This creates a problem for other processes that were using PIDs to point to the older process. To provide a stable handle on a process (or a thread), pidfds were introduced.
You can create a pidfd in two ways:
  • pidfd_open(): creates a pidfd for a specified pid
  • clone(3): creates a pidfd for at process creation time.
Using the pidfd, you can send signals to the process using pidfd_send_signal() and obtain a file descriptor from the process using pidfd_getfd().

Implementation

Link to Pull Request: https://github.com/checkpoint-restore/criu/pull/2449

No Code related to pidfd support has been merged yet.

Update on 3 October 2024: This PR has been merged!

CRIU already has a great number of abstractions for adding support of various file descriptors. Some portion of development was to follow whatever patterns already existed for other file descriptors.
The two problems unique to pidfds were:
  • Obtaining the correct PID: The pid of the process in the Pid field of /proc/$pid/fdinfo/$pidfd is incorrect in case checkpoint/restore happens in a namespace. We end up using the last entry ofthe NSpid field giving the PID in the most deeply nested PID namespace.

  • pidfds that point to dead processes: If we dump a pidfd that points to a dead process, we do not have access to the PID it had pointed to. But, we do have to maintain equality between pidfds, i.e, If two pidfds that pointed to the same dead process before C/R, they should continue to do so after C/R. We solve this problem by creating a temporary process for each unique inode and open pidfds that point to this temporary process. After all pidfds have been opened, we kill this temporary process.

Learnings

  • How helpful man-pages are.
  • Understanding kernel commit messages and a little bit of kernel code.
  • I learned a lot of concepts related to containers such as Namespaces, cgroups, and containers work under the hood.
  • Understood IPC (Interprocess communication), multithreading at a deeper level.
  • GSoC was my first time working in a collaborative environment, and it gave me a sneak peek into professional development.

What's Left to Do

Support for PIDFD_THREAD flag: PIDFD_THREAD allows for pidfds to be opened that point to a single thread instead of a process. I will be working on this after Google Summer of Code.

Work Done After GSoC

Acknowledgements

Huge thank you to my mentor, Aleksandr Mikhalitsyn, for guiding me and taking the extra time to explain concepts that helped me understand Linux and CRIU at a deeper level. Thank you to Andrei Vagin and Radostin Stoyanov for reviewing my pull request alongside Alex.