[MacPorts] #64368: cmake @3.22.1_0: "Error running link command: Invalid argument" when run under trace mode

MacPorts noreply at macports.org
Thu Jan 27 05:12:41 UTC 2022


#64368: cmake @3.22.1_0: "Error running link command: Invalid argument" when run
under trace mode
---------------------------+-----------------------
  Reporter:  chrstphrchvz  |      Owner:  michaelld
      Type:  defect        |     Status:  assigned
  Priority:  Normal        |  Milestone:
 Component:  ports         |    Version:  2.7.1
Resolution:                |   Keywords:  tracemode
      Port:  cmake         |
---------------------------+-----------------------

Old description:

> I observe that when building ports under trace mode, the file descriptors
> given to processes can reach/exceed the default value of `FD_SETSIZE`,
> which is 1024 when not manually defined before including <sys/select.h>.
> CMake—and possibly other programs—are compiled without manually defining
> `FD_SETSIZE` on macOS, and without defining `_DARWIN_UNLIMITED_SELECT` or
> `_DARWIN_C_SOURCE`. This leads to at least two problems using CMake under
> trace mode: `Error running link command: Invalid argument` being output;
> and out-of-bounds memory accesses, one effect being corruption of the
> `Invalid argument` message.
>
> More detailed description:
>
> If a program is compiled without defining `_DARWIN_UNLIMITED_SELECT` (on
> macOS 10.5 Leopard and later, and before including <sys/time.h> for more
> recent macOS versions) or `_DARWIN_C_SOURCE` (unsure which macOS
> versions), then calling `select()` for more than 1024 file descriptors
> fails, and `errno` is set to `EINVAL`.
>
> The `select()` call in `kwsysProcessWaitForPipe()` in
> [https://gitlab.kitware.com/cmake/cmake/-/blob/v3.22.1/Source/kwsys/ProcessUNIX.c#L1227
> ProcessUNIX.c] (inherited from KWSys—Kitware System Library) is affected
> by this limitation: `select()` will fail when `max+1` exceeds 1024. One
> use of the affected call is when running `cmake -E cmake_link_script`. If
> a trace mode build uses enough file descriptors to cause the `select()`
> call to fail, then `strerror(errno)` is used to get the error message
> (for `EINVAL`, it is `"Invalid argument"`), which CMake eventually prints
> out as:
>
> {{{
> Error running link command: Invalid argument
> }}}
>

> Using the `FD_CLR()`/`FD_SET()`/`FD_ISSET()` macros with too high of a
> file descriptor is known to cause out-of-bounds accesses (on various
> platforms—not just macOS), as `sizeof(fd_set)` is directly dependent on
> `FD_SETSIZE`. On macOS, these macros implement bounds checking, but only
> for an `fd_set` residing on the stack, and only on very recent macOS
> versions. The relevant `fd_set` in CMake is a member of a structure
> allocated on the heap, and so corruption/bogus reads do occur for file
> descriptors which are too high: one example is corruption to the error
> message buffer located only a few slots after the `fd_set` in the same
> structure, causing CMake to instead output e.g.
> {{{
> Error running link command: Invalid `rgument
> }}}
>
> or
>
> {{{
> Error running link command: In^Valid argument
> }}}
>
> where `^V` is the ASCII SYN character.
>
> Some programs (e.g. Tcl, in tclUnixPort.h) manually define `FD_SETSIZE`
> to `OPEN_MAX` (from <sys/syslimits.h>). Currently, CMake only defines
> `FD_SETSIZE` manually (in ProcessUNIX.c) for Cygwin.
>
> I observe that defining `FD_SETSIZE` to `OPEN_MAX` and defining
> `_DARWIN_UNLIMITED_SELECT` in ProcessUNIX.c prevents `select()` from
> failing with `EINVAL` when called for more than 1024 file descriptors.
>
> {{{#!diff
> --- Source/kwsys/ProcessUNIX.c.orig
> +++ Source/kwsys/ProcessUNIX.c
> @@ -40,7 +40,13 @@
>  #if defined(__CYGWIN__)
>  /* Increase the file descriptor limit for select() before including
>     related system headers. (Default: 64) */
>  #  define FD_SETSIZE 16384
> +#elif defined(__APPLE__)
> +/* Increase the file descriptor limit for select() before including
> +   related system headers. (Default: 1024) */
> +#  define _DARWIN_UNLIMITED_SELECT
> +#  include <sys/syslimits.h> /* OPEN_MAX */
> +#  define FD_SETSIZE OPEN_MAX
>  #endif
>
>  #include <assert.h>    /* assert */
> }}}
> I believe that both of these definitions are necessary to avoid issues. I
> observe that if `_DARWIN_UNLIMITED_SELECT` is defined but `FD_SETSIZE` is
> left at its default value, the `fd_set` structure is still not large
> enough to avoid out-of-bounds accesses by
> `FD_SET()`/`FD_CLR()`/`FD_ISSET()`; and the `select()` call, which
> previously would fail, now erroneously returns early (because `fd_set` is
> not large enough to indicate which file to block on), causing CMake to
> poll/busy-wait rather than block as intended, but without causing the
> build to fail. (I wonder if it is possible for the opposite to happen,
> where `select()` erroneously blocks on a file descriptor that `fd_set` is
> too small to describe,  and cause the build to stall indefinitely; this
> would resemble unresolved trace mode issues I have noticed and reported
> for other ports.)
>
> The port I used to investigate this issue is `mysql8`, but I hesitate to
> claim that making these changes to CMake would fix #63455, at least for
> anyone other than myself, since I’ve already led others who are much more
> knowledgable to believe that that issue is due to outdated libtool. I
> also don’t know if this fixes #55086.
>
> I don’t know whether these changes are something upstream CMake/KWSys
> should incorporate, because I don’t know whether this is necessarily a
> bug in CMake, i.e. whether CMake (particularly when running
> `cmake_link_script` or other command needing `select()`) is normally able
> to get a file descriptor 1024 or higher, and encounter this issue outside
> of trace mode. But I do find that it is possible, at least on certain
> recent macOS versions, to naïvely compile a simple program which
> successfully opens more than 1024 files without defining `FD_SETSIZE` or
> `_DARWIN_UNLIMITED_SELECT`.
>
> I don’t know if file descriptors growing to unusually large values under
> trace mode is expected behavior. If it isn’t—e.g. due to file descriptor
> leak in trace mode itself, or not cleaning up after programs with file
> descriptor leaks—then I don’t know whether it is causing “Too many open
> files” errors I sometimes observe under trace mode, but have neglected to
> report. I also don’t know if this issue should instead indicate something
> missing from the design of trace mode, like abstraction of file
> descriptor values. And I don’t know if this issue should be considered
> grave enough to condemn previous trace mode builds because
> `FD_SET()`/`FD_CLR()`/`FD_ISSET()` may have been silently causing
> programs to misbehave due to invalid memory accesses.
>
> One idea for making this issue easier to reproduce and find affected
> programs would be to reserve file descriptors under 1024 at the beginning
> of trace mode, forcing programs to use file descriptors 1024 and higher.
> Also note that parallel building—and the number of processors if
> enabled—are factors in triggering this issue, as these affects the order
> in which steps in a build are done, and in turn how large the file
> descriptor values are at a certain point in the build.

New description:

 I observe that when building ports under trace mode, the file descriptors
 given to processes can reach/exceed the default value of `FD_SETSIZE`,
 which is 1024 when not manually defined before including <sys/select.h>.
 CMake—and possibly other programs—are compiled without manually defining
 `FD_SETSIZE` on macOS, and without defining `_DARWIN_UNLIMITED_SELECT` or
 `_DARWIN_C_SOURCE`. This leads to at least two problems using CMake under
 trace mode: `Error running link command: Invalid argument` being output;
 and out-of-bounds memory accesses, one effect being corruption of the
 `Invalid argument` message.

 More detailed description:

 If a program is compiled without defining `_DARWIN_UNLIMITED_SELECT` (on
 macOS 10.5 Leopard and later, and before including <sys/time.h> for more
 recent macOS versions) or `_DARWIN_C_SOURCE` (unsure which macOS
 versions), then calling `select()` for more than 1024 file descriptors
 fails, and `errno` is set to `EINVAL`.

 The `select()` call in `kwsysProcessWaitForPipe()` in
 [https://gitlab.kitware.com/cmake/cmake/-/blob/v3.22.1/Source/kwsys/ProcessUNIX.c#L1227
 ProcessUNIX.c] (inherited from KWSys—Kitware System Library) is affected
 by this limitation: `select()` will fail when `max+1` exceeds 1024. One
 use of the affected call is when running `cmake -E cmake_link_script`. If
 a trace mode build uses enough file descriptors to cause the `select()`
 call to fail, then `strerror(errno)` is used to get the error message (for
 `EINVAL`, it is `"Invalid argument"`), which CMake eventually prints out
 as:

 {{{
 Error running link command: Invalid argument
 }}}


 Using the `FD_CLR()`/`FD_SET()`/`FD_ISSET()` macros with too high of a
 file descriptor is known to cause out-of-bounds accesses (on various
 platforms—not just macOS), as `sizeof(fd_set)` is directly dependent on
 `FD_SETSIZE`. On macOS, these macros implement bounds checking, but only
 for an `fd_set` residing on the stack, and only on very recent macOS
 versions. The relevant `fd_set` in CMake is a member of a structure
 allocated on the heap, and so corruption/bogus reads do occur for file
 descriptors which are too high: one example is corruption to the error
 message buffer located only a few slots after the `fd_set` in the same
 structure, causing CMake to instead output e.g.
 {{{
 Error running link command: Invalid `rgument
 }}}

 or

 {{{
 Error running link command: In^Valid argument
 }}}

 where `^V` is the ASCII SYN character.

 Some programs (e.g. Tcl, in tclUnixPort.h) manually define `FD_SETSIZE` to
 `OPEN_MAX` (from <sys/syslimits.h>). Currently, CMake only defines
 `FD_SETSIZE` manually (in ProcessUNIX.c) for Cygwin.

 I observe that defining `FD_SETSIZE` to `OPEN_MAX` and defining
 `_DARWIN_UNLIMITED_SELECT` in ProcessUNIX.c prevents `select()` from
 failing with `EINVAL` when called for more than 1024 file descriptors.

 {{{#!diff
 --- Source/kwsys/ProcessUNIX.c.orig
 +++ Source/kwsys/ProcessUNIX.c
 @@ -40,7 +40,13 @@
  #if defined(__CYGWIN__)
  /* Increase the file descriptor limit for select() before including
     related system headers. (Default: 64) */
  #  define FD_SETSIZE 16384
 +#elif defined(__APPLE__)
 +/* Increase the file descriptor limit for select() before including
 +   related system headers. (Default: 1024) */
 +#  define _DARWIN_UNLIMITED_SELECT
 +#  include <limits.h> /* OPEN_MAX */
 +#  define FD_SETSIZE OPEN_MAX
  #endif

  #include <assert.h>    /* assert */
 }}}
 I believe that both of these definitions are necessary to avoid issues. I
 observe that if `_DARWIN_UNLIMITED_SELECT` is defined but `FD_SETSIZE` is
 left at its default value, the `fd_set` structure is still not large
 enough to avoid out-of-bounds accesses by
 `FD_SET()`/`FD_CLR()`/`FD_ISSET()`; and the `select()` call, which
 previously would fail, now erroneously returns early (because `fd_set` is
 not large enough to indicate which file to block on), causing CMake to
 poll/busy-wait rather than block as intended, but without causing the
 build to fail. (I wonder if it is possible for the opposite to happen,
 where `select()` erroneously blocks on a file descriptor that `fd_set` is
 too small to describe,  and cause the build to stall indefinitely; this
 would resemble unresolved trace mode issues I have noticed and reported
 for other ports.)

 The port I used to investigate this issue is `mysql8`, but I hesitate to
 claim that making these changes to CMake would fix #63455, at least for
 anyone other than myself, since I’ve already led others who are much more
 knowledgable to believe that that issue is due to outdated libtool. I also
 don’t know if this fixes #55086.

 I don’t know whether these changes are something upstream CMake/KWSys
 should incorporate, because I don’t know whether this is necessarily a bug
 in CMake, i.e. whether CMake (particularly when running
 `cmake_link_script` or other command needing `select()`) is normally able
 to get a file descriptor 1024 or higher, and encounter this issue outside
 of trace mode. But I do find that it is possible, at least on certain
 recent macOS versions, to naïvely compile a simple program which
 successfully opens more than 1024 files without defining `FD_SETSIZE` or
 `_DARWIN_UNLIMITED_SELECT`.

 I don’t know if file descriptors growing to unusually large values under
 trace mode is expected behavior. If it isn’t—e.g. due to file descriptor
 leak in trace mode itself, or not cleaning up after programs with file
 descriptor leaks—then I don’t know whether it is causing “Too many open
 files” errors I sometimes observe under trace mode, but have neglected to
 report. I also don’t know if this issue should instead indicate something
 missing from the design of trace mode, like abstraction of file descriptor
 values. And I don’t know if this issue should be considered grave enough
 to condemn previous trace mode builds because
 `FD_SET()`/`FD_CLR()`/`FD_ISSET()` may have been silently causing programs
 to misbehave due to invalid memory accesses.

 One idea for making this issue easier to reproduce and find affected
 programs would be to reserve file descriptors under 1024 at the beginning
 of trace mode, forcing programs to use file descriptors 1024 and higher.
 Also note that parallel building—and the number of processors if
 enabled—are factors in triggering this issue, as these affects the order
 in which steps in a build are done, and in turn how large the file
 descriptor values are at a certain point in the build.

--

Comment (by chrstphrchvz):

 Adjusted patch: POSIX would say to include <limits.h> for `OPEN_MAX`
 (which is also where Tcl actually picks it up from), not
 <sys/syslimits.h>.

 Submitted upstream:
 https://gitlab.kitware.com/cmake/cmake/-/merge_requests/6909

-- 
Ticket URL: <https://trac.macports.org/ticket/64368#comment:5>
MacPorts <https://www.macports.org/>
Ports system for macOS


More information about the macports-tickets mailing list