[MacPorts] #60178: Don't use SSDs for buildbot workers

MacPorts noreply at macports.org
Thu May 21 02:57:47 UTC 2020


#60178: Don't use SSDs for buildbot workers
-----------------------------+---------------------
  Reporter:  ryandesign      |      Owner:  admin@…
      Type:  defect          |     Status:  new
  Priority:  Normal          |  Milestone:
 Component:  server/hosting  |    Version:
Resolution:                  |   Keywords:
      Port:                  |
-----------------------------+---------------------

Comment (by ryandesign):

 I have been meaning to reply to this ticket since creating it.

 Replying to [ticket:60178 fhgwright]:
 > one should never rely on SSDs for primary storage

 That assertion needs at least some qualification. Apple has been shipping
 Macs with SSDs as primary storage for years; clearly they think it can be
 relied upon. And VMware specifically supports using SSDs as primary VM
 storage, so they clearly think it is acceptable to use that in some cases.
 We can certainly discuss whether using it in our case was the right
 choice.

 Our previous buildbot setup was at macOS forge where two Xserves (for
 buildbot workers) and six HP servers (for everything else) connected via
 10Gbit Ethernet to two expensive NetApp filers each with dozens of hard
 disks in a RAID. In mid 2016 when we needed to set up our own hardware to
 replace what we were losing due to the closure of macOS forge, I lent
 MacPorts my four Xserves to use as buildbot machines. I initially tried to
 find a RAID shared storage solution, some kind of low-end version of what
 the NetApp filers had offered us, but I could find nothing that was
 remotely affordable. And MacPorts does not have a revenue stream, is not a
 legal organization, and is not set up to accept donations. We do have some
 funds from our participation in Google Summer or Code, but at the time we
 did not consider using those funds to purchase a RAID; I don't think the
 funds would have been sufficient in any case. Instead I decided to forgo
 shared storage and use local storage on each Xserve. I believed that
 trying to host three heavy-duty VMs off a single hard disk would be a
 tremendous reduction of performance compared to what we were used to from
 the NetApp filers, so I purchased SSDs for the three Xserves we would use
 as workers. (The fourth Xserve already had an Apple SSD and an Apple hard
 disk RAID and is used as the buildmaster and fileserver.) I have been very
 pleased with the performance of the worker VMs running on these SSDs.

 It is likely that our buildbot workers incur more writes to these SSDs
 than the average workload would. It is possible that the VMs may not see
 their virtual disks as SSDs, or that even if they do macOS may not be
 issuing TRIM commands to them (since it has been Apple's custom to only
 issue TRIM commands to original Apple SSDs), or that any TRIM commands the
 VMs are issuing to those virtual disks are not being passed through to the
 physical SSDs, which would have a write amplification effect and further
 increase the wear on the SSDs and reduce their life expectancy.

 After losing one of our VM hosts' SSDs in February 2020, I recreated the
 VMs on hard disks. I used a separate hard disk for each VM and I guess the
 performance has not been terrible but I haven't actually looked at the
 build times to compare them with the VMs that are still on SSD. I have
 definitely noticed slower startup times.

 In the mean time, in May 2020 we lost a second SSD in a manner very
 similar to the first. I haven't rebuilt those VMs yet.

 I was using Samsung sm951 AHCI 512GB SSDs in m.2 PCIe adapters. Their
 endurance is rated at 150TBW. They were put into service in mid-2016, so
 they've had 3½–4 years of use. Maybe with the way that we are using them
 they've simply been used up. Or maybe they failed for unrelated reasons.

 Despite the trouble we had with these SSDs, I intend to get replacement
 SSDs and to transition back off of the temporary hard disks. Many current
 512GB NVMe SSDs are rated 300TBW on the basis of which they should last
 twice as long as the old ones. Unlike last time, I intend to overprovision
 the new SSDs to extend their life even further. And I will investigate
 using trimforce or other similar options to force macOS to issue TRIM
 commands to reduce write amplification. And maybe we can find ways to
 modify our buildbot / mpbb setup to reduce writes.

 I'm not expecting NVMe to give us much of a speed boost over AHCI PCIe,
 especially since we're limited to the Xserve's PCIe 2.0 bus, it's just
 that AHCI PCIe SSDs are impossible to find now; they're all either AHCI
 SATA, which is slow, or NVMe PCIe. Xserve EFI firmware is not compatible
 with NVMe but I've found instructions for modifying the firmware to make
 it NVMe compatible so that's what I'm intending to do at the moment.

-- 
Ticket URL: <https://trac.macports.org/ticket/60178#comment:6>
MacPorts <https://www.macports.org/>
Ports system for macOS


More information about the macports-tickets mailing list