Buildbot proposal: combine portwatcher and portbuilder

Sun Mar 11 22:48:30 UTC 2018

On 2018-03-11 10:25, Ryan Schmidt wrote:
> The current buildbot setup has a number of problems that I believe could be solved by combining the currently separate portwatcher and portbuilder schedulers into a single ports scheduler.
> 
> I am not suggesting that we return to the behavior of the ports scheduler on the old macOS forge buildbot system in which a single build would build all the specified ports. We will keep the current method of building only one port (and its dependencies) per build.
> 
> The problems I want to solve are the following:
> 
> 1. Currently, portwatcher is responsible for updating a copy of mpbb, MacPorts base and a ports tree that it shares with portbuilder. Having portbuilder maintain its own copy would waste a lot of time. If someone makes one commit that changes 100 ports and then no further commits occur for hours, we only want to update mpbb, MacPorts base and the ports tree once, not 100 times. But the fact that it's shared means that portwatcher must (and is configured to) wait for all triggered portbuilder builds to finish before it processes the next commit. This works fine, unless the buildmaster is stopped while portwatcher builds are pending. This has happened several times when the servers lost power during a power outage. (The servers are on a UPS, but the UPS does not provide as much instantaneous power as I expected, so if the servers are busy building, they draw more power than the UPS can instantaneously provide and the servers shut down immediately. I might remove the buildworker machines from the UPS and leave only the buildmaster, modem and router on it.) When buildmaster comes back online, it sees the portbuilder build that was in progress and starts it again, but it also sees the portwatcher build that was in progress and starts it again. Now we have a portwatcher running (updating mpbb, updating MacPorts, updating the ports tree, and updating its portindex) while portbuilder is trying to install a port. The portbuilder can fail if it is trying to install ports at the moment that portwatcher is updating the index (see https://trac.macports.org/ticket/53587).

Do the steps for selfupdate/sync really hurt that much that cannot just
run them on every portbuilder run? Looking at the portwatcher build you
linked as an example, these steps only took a few seconds in total. Why
can we not just move these steps to the portbuilder?

At the moment, portwatcher and portbuilder are sharing resources, but
buildbot assumes each builder is isolated and that leads to these
problems when resuming builds. I guess we should not do that...

> 2. If a single portwatcher build "X" triggers many portbuilder builds, and while those portbuilder builds are in progress another commit comes in that would affect those ports, it don't notice until all portbuilder builds triggered by "X" are finished. This can waste time building ports that are already superseded by newer versions or revisions. An extreme example of this is if we were to force a portwatcher build for all ports (which we might want to do when a new version of macOS is released). mpbb, MacPorts base and the ports tree would be updated once, and then it would schedule a portbuilder for each port that had not yet been built. Building all ports will take weeks. During that time, a commit may come through that updates a port to a new version. But if the build of the old version of the port was still pending at that time, the buildbot will build the old version, because it can't update the ports tree until the current portwatcher build is done waiting for its triggered portbuilder builds.

To me it seems like this is only an issue for forced builds, not for the
builds scheduled by commits.

So maybe for this use case the force scheduler should be on a level one
higher in the hierarchy. Then the force scheduler would get a list of
ports and for each schedule a portwatcher with only one port name (or
use some other way of partitioning).

> 3. When there are portwatcher builds pending, we have no idea how many portbuilder builds are pending. It may say there are e.g. 3 portbuilder builds pending, but the pending portwatchers could trigger any number of additional portbuilders.

Why does it matter how many portbuilder builds will be scheduled later?
I do not see the problem...?

Overall, my immediate thought was that with the current portwatcher, we
could just not wait for the triggered builds to finish. That seems to
solve (1) and (2), but also lose the ability to send summary emails.
Although I did not think enough about this whether it would really work.

Am I missing something why we would definitely need to merge portwatcher
and portbuilder?

> An objection to this proposal was that buildbot 0.8 does not have the capability to dynamically create scheduler steps at runtime. But that's not required and that's not what I'm proposing.
> 
> Buildbot has the ability to call a function for each step to determine if that step should run, by specifying the doStepIf property. I recently started using this feature in portwatcher to skip the two trigger steps if there are no ports in the port list:
> 
> https://github.com/macports/macports-infrastructure/commit/18135d6c75698f88b48698473c9364063fb6fba9
> 
> Here is an example of what that looks like when it runs:
> 
> https://build.macports.org/builders/ports-10.13_x86_64-watcher/builds/3989
> 
> The only port that was committed there had already been built so it was excluded from the port list, leaving the list empty, so the portbuilder trigger step was skipped (to save time) and the mirror trigger step was skipped (to prevent it from printing an error that no ports were specified). The skipped steps are still shown in the web interface, but if that's not desired, they can be hidden by also using the hideStepIf property.

I noticed this and I like this change. +1

> So my proposed combined ports scheduler would still contain all of the steps of the current portwatcher and portbuilder schedulers, but each build would still conceptually "be" either a portwatcher or a portbuilder, and for each build, the steps that don't relate to that conceptual function would be skipped and hidden.

> [...]

It sounds like a complete hack to use doStepIf this way...

> This should solve problem (1). By having portwatcher and portbuilder tasks in the same queue, they can't run simultaneously so they can't cause the problems that happen when they run simultaneously.
>
> Solving (2) and (3) requires an additional step. Buildbot allows you to define a nextBuild property when you create the builder, and pass it a function that determines which of the pending builds should go next.
> 
> http://docs.buildbot.net/0.8.12/manual/cfg-builders.html#builder-configuration
> 
> We would write a function that looks through the pending builds in the order in which they were scheduled, and picks the first one that is a portwatcher (the first one that doesn't have a "portname" property). If none are portwatchers, it picks the first* one. (* This is where we would later improve the situation to pick the next port in the correct dependency order, but I have another email about that.)
> 
> This solves (2) because now when a new commit comes in, a new "portwatcher" gets added to the end of the ports scheduler's queue, but it will be the next build picked immediately after the current "portbuilder" finishes, no matter how many other "portbuilders" are still pending. That will update the ports tree, so any pending builds for ports that were subsequently updated will build the now-current version, not the old version.

But the old builds would be left in the queue and follow after the
portwatcher, so we still have multiple builds in the queue for the same
port...

As the builds were scheduled by an earlier commit (which buildbot keeps
in the "sourcestamp"), the attribution for notifications might be
difficult. I am afraid it could lead to reports of a build failure for
the wrong commit unless you manage to cancel the pending builds.

Buildbot would normally solve this by merging build requests. If a new
build request gets put into the queue, the old one is discarded. The
problem is that we cannot merge portwatcher builds, because we would
only be able to do that as soon as we know the list of ports. However,
we can only determine that once the portwatcher build is executed, so it
is no option for us.

https://docs.buildbot.net/0.8.12/manual/cfg-builders.html#merging-build-requests

If the portwatcher triggered the portbuilder and then exited, we could
look into merging the portbuilder jobs.

> It also solves (3) because now if 100 "portbuilder" builds are pending, we don't have to wait until all of them are finished before we know how many builds all the pending "portwatcher" builds will schedule; we only have to wait until the current "portbuilder" finishes.

> One drawback, depending on how you look at it, is that we would no longer be able to send a single combined email for all of the failed builds of a single commit, or of a single forced build. We would have to send an individual email for each failed build. Personally, I would prefer that, as the subject line of the email would make clear which port failed to build, rather than requiring me to open it to see what happened.

Didn't we invest a lot of effort to find a way to only send a single
email instead of sending one per build...?

> If we give the new combined scheduler a new name like "ports", that will invalidate all old links to build logs. That's not a deal-breaker for me, but since we often paste build log URLs into tickets, it would be nice to keep them alive if we can. Maybe defining empty portbuilder and portwatcher schedulers would be enough.

I would not care about the logs at all. They are only interesting for a
limited time and only for failed builds. It should be no problem to just
delete old logs.

Rainer