Buildbot Performance

Sun May 16 14:48:09 UTC 2021

In terms of the ratio of vCPUs to GB of RAM, 1:1 isn’t totally unreasonable. However, we should also reserve 2 GB of RAM for the OS, including the disk cache. So perhaps 6 vCPUs would be a better choice.

As for the total physical CPUs available on our Xserves, here’s the rub: While hyperthreading does provide some benefit, best-case it generally only provides 50% more headroom. And sometimes it’s as low as 25%.

So if we assume best-case, our Xserve’s only provide the processing power of 12 CPU cores, when accounting for hyperthreading. So even if only two builders are active, we’re already well overcommitted on CPU. And with three or more going, I’d bet the hypervisor is spending more time on scheduling and pre-emption, than actual processing time.

By way of comparison, I’m running on a modest 2008-era MacPro, with only eight physical CPU cores… and no hyper threading. Plus the Xeons in my MacPro are one major generation behind the Nehalem-based CPUs on our Xserves. Yet, my port build times are anywhere from 2x to 10x faster than we’re seeing on our builders. (And no, that’s not an exaggeration.)

So we need to do something, as the buildbots simply can’t keep up.

Upgrading them to six-core Xeons would absolutely help, for sure. But I’m quite certain that we could also improve the situation, by reducing the level of CPU overcommitment. And reducing the vCPUs per VM would help, as we simply don’t have the physical CPU power to support eight/VM.

> On 2021-05-16-S, at 00:13, Ryan Schmidt <ryandesign at macports.org> wrote:
> 
>> On May 14, 2021, at 07:12, Christopher Nielsen wrote:
>> 
>> Since we’re overcommitting on CPU, I’m wondering if it would make sense to reduce the vCPUs in each VM to 4? In addition to reducing any swapping, that might also reduce the hypervisor context-switching overhead, and improve build times somewhat. (It’s been awhile, but I think (?) there might be additional hypervisor overhead from overcommitment.)
> 
> My assumption is that that would have the opposite effect.
> 
> Not all commits or forced builds result in builds on every builder. Some ports are marked as known-fail on some OS versions; others aren't marked but still fail on some OS versions. And because of the way we currently have distfile mirroring set up, all builds trigger a mirror task, even if distfiles are already mirrored, so often several builds are waiting for mirroring tasks to complete. We want to use as many CPUs as possible so builds finish as quickly as possible. If we only end up starting one build, for example, we want it to use 8 CPUs. No point using only 4 CPUs and leaving the rest idle.
> 
> Using only 4 CPUs but still using 8 GB RAM would convey an assumption that each compile job needs up to 2 GB RAM. If that is a valid assumption, fix it in MacPorts base. It currently assumes one compile job needs up to 1 GB RAM. That assumption was coded a long time ago, but I think it's probably still close to the truth. There are a few ports where that assumption is not true, like tensorflow. We should enhance base to allow ports to override the assumption, like https://trac.macports.org/ticket/62554.
> 
> Yes I'm sure overcommitment has some kind of overhead in the hypervisor. My assumption is that the developers of the hypervisor are smart and that the amount of overhead is small.