GSoC 2019 [Collect build statistics]

Mojca Miklavec mojca at macports.org
Thu Mar 28 06:35:08 UTC 2019


On Wed, 27 Mar 2019 at 17:13, Arjun Salyan wrote:
>>
>> - A more elaborate plan about how you plan to handle updates / keep
>> the database up-to-date. Sure, we can trigger certain actions from the
>> buildbot, but those various "actions" need to be implemented. Keeping
>> the app up to date in a safe and reliable way is a very important part
>> of the project, and requires collecting data from various sources.
>> "Look for the most efficient ways to keep the PortIndex and Build
>> History up-to-date" should be already attempted now.
>
>
> We can keep the build history up-to-date by using HttpStatusPush,
> I read about it in buildbot documentation. It sends a json object containing build data.

OK, write that down (see below).

> This would even remove the need of a parsing script on web-app's end which fetches
> the logs from buildbot.

What if there's a server outage?

> But I am having a problem in reaching at a good method to keep PortIndex updated.
> PortIndex does not give any id to each port, and suppose I assign them ids in the
> database. Then if a port is renamed, it would be impossible to identify which port
> was renamed because PortIndex has no idea about the ids in the database.
>
> Another problem is the size of the file- every time running portindex2json.tcl over the
> generated portindex and then looking for changes does not appear to be very efficient.
> Neither does the build page seem to provide any relevant info about the changes.

Sure, the build page doesn't provide relevant info about the changes:
the future app should.

> Any suggestions on tackling these would be very helpful.

(1) Identify all the individual items that will need to be updated and
write them down. Fetching from PortIndex and builds are two items, but
not the only ones. We want to know which ports have been updated
upstream, which websites seem broken, and more. For each of the items
suggest how frequently it should be done (checking for updates
definitely requires less frequent updates than the build status etc.).

(2) Think about different scenarios:
- how to update as fast as the change arrives (immediately after new
commits happen or builds are done ...)
- how to properly handle cases when there was a server outage, or
there was an error while updating and "live data" went missing

(3) The database needs to be designed in such a way (and the software
needs to be written in such a way) that frequent updates of the full
portindex2json:
    (a) works correctly (ports missing from PortIndex are marked as
gone, no duplicate entries of ports, all info up-to-date)
    (b) works super efficiently
    (c) works with minimal overhead
If network speed is the bottleneck, make sure that you feed / update
the database from the same machine where the database is running.
Updating via git is super fast, you want to avoid transferring the
full 20MB file over network over and over again. Even if the testing
system is running at strange configurations, suggest the architecture
of how it would ideally be implemented if you can design the system
and architecture yourself.

(4) Suggest a way to minimize the data transfer, so that it will only
include the changes rather than the full data set. How to get such
data? What would need to be changed / improved?

(5) You won't be getting port renames. What you do get is
"replaced_by" information at best (say, perl5.26 could be replaced_by
perl5.28). When a port is renamed, treat it as a different port, but
the old port could be marked as "inactive" and "replaced_by <which
port>" (if it's not deleted yet). This information is probably not in
PortIndex, either portindex would need to be improved, or you need to
find a different way.

Mojca


More information about the macports-dev mailing list