Support for Files on Unmounted Mounts: GSoC 2025

What's CRIU?

CRIU (Checkpoint/Restore in Userspace) allows you to freeze a running application and save it to disk (in the form of a set of image files). We can restore (or restart) the application using these image files. CRIU allows for container migration, snapshots, and the like. It is integrated into LXC, Docker, and Podman.

The Project

In order for CRIU to correctly checkpoint/restore a process (or container) it needs to dump all file descriptors the process has open and thus the mounts these fds are on. CRIU already does a pretty good job at doing so. However, if the file is on a mount that has been "unmounted" (umount2(mnt, MNT_DETACH)) we can't get information about the mount from /proc/$pid/mountinfo. We want to divise a way to get mountinfo for such mounts and support checkpoint/restore for files on these mounts.

Problems

  • Initially, I had the assumption that statmount() worked on "unmounted" mounts. This was incorrect. Thus, we had to come up with a way to export mountinfo for "unmounted" mounts using statmount().
  • Even, if we had mountinfo, we would still have to figure out some way to checkpoint/restore the mount. Since, the mount has been "unmounted" we can't really get full information about the mount and files on that mount. So, we reduced the scope to add support for "unmounted" bind mounts, for now. This is simpler because we assume that the "unmounted" mount is a bind mount of a regular mount. Thus, we can just MNT_BIND from the regular mount during restore.

Linux Kernel Patches

We suggested two different approaches to export mountinfo for "unmounted" mounts using statmount():
  1. A new mount namespace in the kernel for "unmounted" mounts: If a mount gets "unmounted" it gets removed from all mount namespaces. We suggested instead they get added to a kernel only mount namespace. So, we can then call statmount() with this umount_mnt_ns namespace's mnt_ns_id to get the mountinfo we need. We also modified statx to export mnt_ns_id. Thus, to get mountinfo for a fd, we could do this:

    statx(fd, "", AT_EMPTY_PATH, MNT_ID_UNIQUE | MNT_NS_ID, &stat);
    struct mnt_id_req req = { .size = MNT_ID_REQ_SIZE_VER1, .mnt_id = stat.mnt_id, .mnt_ns_id = stat.mnt_ns_id, .flags = /* flags */ };
    statmount(&req, &statmount_buf, buf_size, 0);

    We sent our patch upstream, but this approach has a few problems. It complicated the implementation of mount namespaces and mounts in the kernel (See the discussion on the patchset). We got feedback that the second approach was better.

  2. Adding a fd parameter to statmount(): We added a fd parameter to struct mnt_id_req and a new STATMOUNT_BY_FD flag to statmount(). This allowed statmount() to directly return mountinfo for the mount the fd is on, even if the fd is on an "unmounted" mount. Here's how it would work:

    struct mnt_id_req req = { .size = MNT_ID_REQ_SIZE_VER1, .fd = fd, .flags = /* flags */ };
    statmount(&req, &statmount_buf, buf_size, STATMOUNT_BY_FD);

    The maintainers have mostly found this approach more acceptable. We have sent out two versions that implement fd-based-statmount() v2 and v3. We are in the process of incorporating their comments and hoping to upstream these changes soon.

CRIU Changes

Since, we focused on only supporting "unmounted" bind mounts, it made the CRIU changes relatively simpler. We did the following:
  1. Wrote a bunch of tests trying to cover all edge cases we could think of.
  2. Introduced the statmount() syscall in the CRIU codebase and added kerndat (system CRIU uses to check support for a kernel feature) checks for statmount().
  3. Added Ability to checkpoint/restore "unmounted" bind mounts.

Checkpoint/Restore of "unmounted" bind mounts

Changes During Dump Stage

CRIU constructs a list of all mounts called mntinfo before it starts dumping fds. So, if during the dumping of fds, we get a mnt_id not in the mntinfo list, we assume it to be an "unmounted" mount. We then try to get mountinfo for this mount by using the statmount() syscall (with the changes we made) and mark this mount to be "unmounted" in the image.

Changes During Restore Stage

Restoration of "unmounted" mounts takes place outside the general mount restore process.
  1. We create a list of detached_mounts during the collect phase.
  2. For each mount, we create a temporary mount point in the format: /.criu.detached.XXXXXX (since in case, of detached mounts we don't have a mountpoint)
  3. While opening fds, we open them at /.criu.detached.XXXXXX/$filename.
  4. After all files are restored we umount them using MNT_DETACH and remove all our mountpoints.
Link to the CRIU Pull Request

What's Left To Do?

Kernel Side
  1. Get the patch merged.
  2. Write tests (kselftests).
  3. Update statmount() docs.
For the future, we can also think about how to export information about anonymous mounts. Anonymous mounts can be created using the new mount API either with open_tree() with OPEN_TREE_CLONE or fsmount(). We can also open files on these anonymous mounts.

CRIU Side
  1. Add support these because we can't get information about the filesystem for these mounts. We can think about restoring only the things we know and recreating the file system "partially".
  2. Support for anonymous mounts as well.

Learnings

This was the first time for working with Linux Kernel which ended up pretty fun and I got to learn the entire kernel development process.

Work After GSoC

Currently focused on getting the kernel patches merged.

Acknowledgements

This project would not have been possible without my mentor Pavel Tikhomirov, who helped me in every part of the process.