[MacPorts] #59497: openssh @8.1p1: sshd only works in debug mode

Thu Nov 7 09:58:13 UTC 2019

#59497: openssh @8.1p1: sshd only works in debug mode
-------------------------+----------------------
  Reporter:  davidfavor  |      Owner:  (none)
      Type:  defect      |     Status:  reopened
  Priority:  Normal      |  Milestone:
 Component:  ports       |    Version:  2.6.2
Resolution:              |   Keywords:
      Port:  openssh     |
-------------------------+----------------------

Comment (by Ionic):

 I finally understood what is going on, hooray. Leaving this here for
 future generations.

 The sandbox never really had a role to play in this issue. Rather, it was
 a combination of OpenSSH castrating itself and the OpenSSL crypto core
 being rewritten in `1.1.1*` and functioning completely differently
 compared to older releases (such as `1.1.0*`). The sandbox **would** have
 affected it, but it never came to that.

 What OpenSSL 1.1.1 uses, compared to older versions, is an "AES-CTR DRBG
 according to NIST standard SP 800-90Ar1". It also introduced crypto
 objects chaining, such that each random number generator object can be
 hooked up to another via parenting. They also introduced two global
 instances of this DRBG - one used for generating random numbers for use
 with public keys, the other one for generating random data for use with
 private keys. This makes the code more complicated, but trust me, that's
 actually a good thing!

 Each DRBG has a specific state it is in (`uninitialized`, `ready`,
 `error`) and a few pools with random data - for seeding, additional data
 and getting actual randomness out of it.

 When a DRBG is created (internally or externally, though for OpenSSH it's
 really an internal implementation detail in OpenSSL), the code is creating
 a seed pool - initially comprised of seeding data the application provides
 - and then tries to get more entropy from the system to add to this pool.
 This means that a bad seed does not necessarily compromise the random
 number generator used by OpenSSL, which sounds good!

 When it's reseeded or random data requested by the application, the
 internal state is checked. If it's not `READY` but `ERROR`, the DRBG is
 restarted (uninitialized and initialized again) in order to clear the
 error state - including, if applicable, its parent DRBG instances.

 So... why does this fail in a forked OpenSSH child?

 As already explained, during initialization, system entropy is fetched
 through different means. These means, on OS X/macOS consist of:

   1. using the `getentropy` system call to fill a buffer with random bytes
 (but THAT one is only available on 10.12 and higher!) XOR
   2. reading random data from system devices like `/dev/urandom`,
 `/dev/random`, `/dev/hwrng`, `/dev/srandom` and something else I've
 forgotten IFF they exist and can be opened successfully. Crucially, they
 are only opened once and the file descriptor left open for additional,
 later access if reading from the device actually returned useful data. XOR
   3. generating entropy via the `RDTSC` method that reads a high-
 resolution timer within the CPU XOR generating entropy via the
 `RDSEED`/`RDRAND` CPU instruction(s).

 There is no other entropy source defined in OpenSSL 1.1.1. For OS X/macOS
 this list is shortened further, because:
   - the `RDTSC` method is forcefully disabled within OpenSSL (quote:
 "IMPORTANT NOTE:  It is not currently possible to use this code because we
 are not sure about the amount of randomness it provides. Some SP900 tests
 have been run, but there is internal skepticism. So for now this code is
 not used.")
   - the `RDSEED`/`RDRAND` functions are implemented, but not enabled by
 default and we don't enable them. That's probably fine, because using a
 default-disabled function set in a security-related application feels
 weird.

 Additionally, both these methods would only be usable on `x86_64` (or
 maybe also `x86`) CPUs, which would leave out `ppc` ones for good.

 To recap, on 10.11 and below, the only entropy source as usable by OpenSSL
 are the system devices `/dev/urandom` and `/dev/random`.

 These would work fine, **but** OpenSSH pulls an additional trigger
 **after** enabling the sandbox:
 {{{
         /*
          * The kSBXProfilePureComputation still allows sockets, so
          * we must disable these using rlimit.
          */
         rl_zero.rlim_cur = rl_zero.rlim_max = 0;
         if (setrlimit(RLIMIT_FSIZE, &rl_zero) == -1)
                 fatal("%s: setrlimit(RLIMIT_FSIZE, { 0, 0 }): %s",
                         __func__, strerror(errno));
         if (setrlimit(RLIMIT_NOFILE, &rl_zero) == -1)
                 fatal("%s: setrlimit(RLIMIT_NOFILE, { 0, 0 }): %s",
                         __func__, strerror(errno));
         if (setrlimit(RLIMIT_NPROC, &rl_zero) == -1)
                 fatal("%s: setrlimit(RLIMIT_NPROC, { 0, 0 }): %s",
                         __func__, strerror(errno));
 }}}

 This code has been in there for longer than a decade as well and what it
 does is:
   1. disabling creating new files with a file size greater than zero (so
 essentially writing any data to files... and sockets(?))
   2. disabling OPENING any files or sockets to begin with
   3. disabling spawning additional processes

 That's generally fine, because the forked child is only used for
 authentication and gets all its internal state from the parent instance it
 was forked from. It doesn't need to create additional files or network
 sockets and this makes the process more robust to outside tinkering by
 buffer overflows or the like. The sandbox also plays a big role in that
 hardening, of course.

 However, you might have noticed a conflict here: thusly spawned processes
 may **not** open any new files, but OpenSSL `1.1.1*` might need to (and,
 on older systems, **must**) open system crypto devices to garner entropy.
 Boom.

 This also explains why reseeding the DRBG(s) prior to enabling the sandbox
 works and continue to work afterwards: the operations succeed, open the
 crypto devices and **leave it open**, keeping the file descriptor around.
 Subsequent reseeding operations can then continue to use it.

 But... why did this work for such a long time without generating errors?

 Previous OpenSSL versions (1.1.0 and older) are scary. They also
 initialize a random number generator if it wasn't previously initialized
 when requesting random data and that operation **would** generally also
 pull in system entropy via system crypto devices on OS X/macOS, but...
 failures to do so are non-fatal. That state is never recorded properly.
 Additionally, the random seed and random data in general seems to be
 getting hashed in previous versions in order to fill the pool. Also,
 failures to fill the pool with system entropy do not necessarily need to
 lead to failures when fetching random data within the application, since
 previous OpenSSL versions also mix in some "pseudo-random" data like the
 PID, user ID and current timestamp to the pool unconditionally. And the
 random pool data also seems to be getting hashed when requesting it in the
 application...

 Since OpenSSH only ever requests one byte of random data, that might be
 just enough to satisfy the condition.

 As far as I can tell, the error condition was just masked by OpenSSL's
 previous implementation.

 ----

 Now that we know what is going wrong, the remaining question is how to fix
 it.

 Calling the reseed function prior to enabling the sandbox is a valid
 workaround. By doing this, OpenSSL will open a file descriptor to some
 crypto device (typically `/dev/urandom`) and cache it. As soon as the
 device returns data, it shouldn't get closed, so we can continue to use it
 in the process. The caveat with that approach is, that, should the device
 block at some point and NOT return more data, the file descriptor will be
 closed and OpenSSL will not be able to reopen it again. That normally
 shouldn't be the case for `/dev/urandom`, so I don't see this as a huge
 drawback.

 Alternatively, I could relax the number of open files limitation (to what
 level, though?) and add a sandbox exception for the crypto devices. That
 would probably also work, but relax the security limitations a bit too
 much - i.e., the process could suddenly open and read other files as well.
 For this reason, I don't like that solution.

 I'll probably commit a fix with the first implementation tomorrow.

-- 
Ticket URL: <https://trac.macports.org/ticket/59497#comment:13>
MacPorts <https://www.macports.org/>
Ports system for macOS