[MacPorts] #60178: Don't use SSDs for buildbot workers
MacPorts
noreply at macports.org
Thu May 21 02:57:47 UTC 2020
#60178: Don't use SSDs for buildbot workers
-----------------------------+---------------------
Reporter: ryandesign | Owner: admin@…
Type: defect | Status: new
Priority: Normal | Milestone:
Component: server/hosting | Version:
Resolution: | Keywords:
Port: |
-----------------------------+---------------------
Comment (by ryandesign):
I have been meaning to reply to this ticket since creating it.
Replying to [ticket:60178 fhgwright]:
> one should never rely on SSDs for primary storage
That assertion needs at least some qualification. Apple has been shipping
Macs with SSDs as primary storage for years; clearly they think it can be
relied upon. And VMware specifically supports using SSDs as primary VM
storage, so they clearly think it is acceptable to use that in some cases.
We can certainly discuss whether using it in our case was the right
choice.
Our previous buildbot setup was at macOS forge where two Xserves (for
buildbot workers) and six HP servers (for everything else) connected via
10Gbit Ethernet to two expensive NetApp filers each with dozens of hard
disks in a RAID. In mid 2016 when we needed to set up our own hardware to
replace what we were losing due to the closure of macOS forge, I lent
MacPorts my four Xserves to use as buildbot machines. I initially tried to
find a RAID shared storage solution, some kind of low-end version of what
the NetApp filers had offered us, but I could find nothing that was
remotely affordable. And MacPorts does not have a revenue stream, is not a
legal organization, and is not set up to accept donations. We do have some
funds from our participation in Google Summer or Code, but at the time we
did not consider using those funds to purchase a RAID; I don't think the
funds would have been sufficient in any case. Instead I decided to forgo
shared storage and use local storage on each Xserve. I believed that
trying to host three heavy-duty VMs off a single hard disk would be a
tremendous reduction of performance compared to what we were used to from
the NetApp filers, so I purchased SSDs for the three Xserves we would use
as workers. (The fourth Xserve already had an Apple SSD and an Apple hard
disk RAID and is used as the buildmaster and fileserver.) I have been very
pleased with the performance of the worker VMs running on these SSDs.
It is likely that our buildbot workers incur more writes to these SSDs
than the average workload would. It is possible that the VMs may not see
their virtual disks as SSDs, or that even if they do macOS may not be
issuing TRIM commands to them (since it has been Apple's custom to only
issue TRIM commands to original Apple SSDs), or that any TRIM commands the
VMs are issuing to those virtual disks are not being passed through to the
physical SSDs, which would have a write amplification effect and further
increase the wear on the SSDs and reduce their life expectancy.
After losing one of our VM hosts' SSDs in February 2020, I recreated the
VMs on hard disks. I used a separate hard disk for each VM and I guess the
performance has not been terrible but I haven't actually looked at the
build times to compare them with the VMs that are still on SSD. I have
definitely noticed slower startup times.
In the mean time, in May 2020 we lost a second SSD in a manner very
similar to the first. I haven't rebuilt those VMs yet.
I was using Samsung sm951 AHCI 512GB SSDs in m.2 PCIe adapters. Their
endurance is rated at 150TBW. They were put into service in mid-2016, so
they've had 3½–4 years of use. Maybe with the way that we are using them
they've simply been used up. Or maybe they failed for unrelated reasons.
Despite the trouble we had with these SSDs, I intend to get replacement
SSDs and to transition back off of the temporary hard disks. Many current
512GB NVMe SSDs are rated 300TBW on the basis of which they should last
twice as long as the old ones. Unlike last time, I intend to overprovision
the new SSDs to extend their life even further. And I will investigate
using trimforce or other similar options to force macOS to issue TRIM
commands to reduce write amplification. And maybe we can find ways to
modify our buildbot / mpbb setup to reduce writes.
I'm not expecting NVMe to give us much of a speed boost over AHCI PCIe,
especially since we're limited to the Xserve's PCIe 2.0 bus, it's just
that AHCI PCIe SSDs are impossible to find now; they're all either AHCI
SATA, which is slow, or NVMe PCIe. Xserve EFI firmware is not compatible
with NVMe but I've found instructions for modifying the firmware to make
it NVMe compatible so that's what I'm intending to do at the moment.
--
Ticket URL: <https://trac.macports.org/ticket/60178#comment:6>
MacPorts <https://www.macports.org/>
Ports system for macOS
More information about the macports-tickets
mailing list