Skip to content
  • Giuseppe Scrivano's avatar
    582f1fb6
    fs, close_range: add flag CLOSE_RANGE_CLOEXEC · 582f1fb6
    Giuseppe Scrivano authored
    
    
    When the flag CLOSE_RANGE_CLOEXEC is set, close_range doesn't
    immediately close the files but it sets the close-on-exec bit.
    
    It is useful for e.g. container runtimes that usually install a
    seccomp profile "as late as possible" before execv'ing the container
    process itself.  The container runtime could either do:
      1                                  2
    - install_seccomp_profile();       - close_range(MIN_FD, MAX_INT, 0);
    - close_range(MIN_FD, MAX_INT, 0); - install_seccomp_profile();
    - execve(...);                     - execve(...);
    
    Both alternative have some disadvantages.
    
    In the first variant the seccomp_profile cannot block the close_range
    syscall, as well as opendir/read/close/... for the fallback on older
    kernels.
    In the second variant, close_range() can be used only on the fds
    that are not going to be needed by the runtime anymore, and it must be
    potentially called multiple times to account for the different ranges
    that must be closed.
    
    Using close_range(..., ..., CLOSE_RANGE_CLOEXEC) solves these issues.
    The runtime is able to use the existing open fds, the seccomp profile
    can block close_range() and the syscalls used for its fallback.
    
    Signed-off-by: default avatarGiuseppe Scrivano <gscrivan@redhat.com>
    Link: https://lore.kernel.org/r/20201118104746.873084-2-gscrivan@redhat.com
    
    
    Signed-off-by: default avatarChristian Brauner <christian.brauner@ubuntu.com>
    582f1fb6
    fs, close_range: add flag CLOSE_RANGE_CLOEXEC
    Giuseppe Scrivano authored
    
    
    When the flag CLOSE_RANGE_CLOEXEC is set, close_range doesn't
    immediately close the files but it sets the close-on-exec bit.
    
    It is useful for e.g. container runtimes that usually install a
    seccomp profile "as late as possible" before execv'ing the container
    process itself.  The container runtime could either do:
      1                                  2
    - install_seccomp_profile();       - close_range(MIN_FD, MAX_INT, 0);
    - close_range(MIN_FD, MAX_INT, 0); - install_seccomp_profile();
    - execve(...);                     - execve(...);
    
    Both alternative have some disadvantages.
    
    In the first variant the seccomp_profile cannot block the close_range
    syscall, as well as opendir/read/close/... for the fallback on older
    kernels.
    In the second variant, close_range() can be used only on the fds
    that are not going to be needed by the runtime anymore, and it must be
    potentially called multiple times to account for the different ranges
    that must be closed.
    
    Using close_range(..., ..., CLOSE_RANGE_CLOEXEC) solves these issues.
    The runtime is able to use the existing open fds, the seccomp profile
    can block close_range() and the syscalls used for its fallback.
    
    Signed-off-by: default avatarGiuseppe Scrivano <gscrivan@redhat.com>
    Link: https://lore.kernel.org/r/20201118104746.873084-2-gscrivan@redhat.com
    
    
    Signed-off-by: default avatarChristian Brauner <christian.brauner@ubuntu.com>
Loading