Buildbot Performance
Ryan Schmidt
ryandesign at macports.org
Fri May 14 04:54:42 UTC 2021
On May 12, 2021, at 07:41, Christopher Nielsen wrote:
>
> On 2021-05-12-W, at 08:32, Christopher Nielsen wrote:
>
>> Looking at the build times for various ports, it varies significantly.
>>
>> I was curious, are we overcommitting virtual CPUs vs. the number of available physical cores on our Xserves? And is disk swapping coming into play, within the VMs themselves?
For most ports, I don't believe swapping within the VM occurs. py-tensorflow and other ginormous ports that exceed MacPorts expectations about compiler memory use are exceptions. On the VMware ESXi side, all VMs have 100% of their RAM reserved, so no swapping occurs there.
> To clarify my question about overcommitment: Are the total number of virtual CPUs for the buildbot VMs running on a given Xserve, greater than the number of physical CPU cores available?
Yes we are overcommitting CPU. Each VM has 8 virtual CPUs (the maximum VMware ESXi allows without a paid license) and typically 8 GB RAM (except the 10.6 i386 builder which only has 4 CPUs and 4 GB RAM, the maximum for 32-bit). Each Xserve has 2 4-core processors, presenting as 16 hyperthreaded cores. Normally we would have 3-4 VMs on each Xserve:
R (2.66GHz 32GB): 10.6i, 10.6x, 10.9, 10.15 (SSD)
A (2.26GHz 32GB): 10.7, 10.10, 10.13 (SSD), backup (HD)
S (2.26GHz 32GB): 10.8 (HD), 10.11 (HD), 10.14 (HD)
M (2.26GHz 27GB): 10.12, 11x, buildmaster/files, buildmaster2 (SSD)
Server R's fan array failed two weeks ago. I turned off server A and put its fan array, SSD and RAM into server R, so it now runs 7 build VMs plus the backup VM:
R (2.66GHz 64GB): 10.6i, 10.6x, 10.9, 10.15 (SSD), 10.7, 10.10, 10.13 (SSD), backup (HD)
S (2.26GHz 32GB): 10.8 (HD), 10.11 (HD), 10.14 (HD)
M (2.26GHz 27GB): 10.12, 11x, buildmaster/files, buildmaster2 (SSD)
When they're all fully busy, certainly that will be slower than when more CPUs were available on the two separate servers. But this seemed to be working pretty well, up until the huge batches of builds a few days ago (my updates to 3 php versions worth of subports, followed by gcc updates and forced builds of everything depending on gcc) which has resulted in a backlog (on all servers, even those that were not consolidated).
Redistributing the VMs to make it more balanced is possible, for example:
R (2.66GHz 48GB): 10.6i, 10.6x, 10.9, 10.15 (SSD), 10.11 (HD), backup (HD)
A (2.26GHz 48GB): 10.7, 10.10, 10.13 (SSD), 10.8 (HD), 10.14 (HD)
M (2.26GHz 27GB): 10.12, 11x, buildmaster/files, buildmaster2 (SSD)
After seeing that using just 3 servers seemed to work, I had planned to do this, but of course need to wait until the builders are idle.
Replacing the failed fan array and going back to 4 servers is possible, though I do like the idea of running fewer servers (less noise, less electricity).
Replacing server S's hard drives with an SSD is possible
Upgrading the CPUs in one or more servers to faster more efficient 6-core Westmere models is possible.
More information about the macports-dev
mailing list