GSoC 2019 [Collect build statistics]

Mojca Miklavec mojca at macports.org
Mon Apr 1 21:44:02 UTC 2019


Dear Arjun,

On Mon, 1 Apr 2019 at 18:38, Arjun Salyan
<arjun.salyan.che17 at itbhu.ac.in> wrote:
>
> Hi,
> I was working on keeping the PortIndex updated, and was able to achieve this:
>
> Sync Portindex from 'rsync://rsync.macports.org/macports//trunk/dports/PortIndex_darwin_16_i386/PortIndex'
> Update or Add ports that were recently built on 10.14_x86_64 (using time frame 'last 24 hours' for now).
> New ports, (SoapyAirspy, SoapyAirspyHF etc) were successfully added, and can now be seen on the demo app.
>
> This is exactly the approach I wrote in the proposal and I wanted to show a working demo, so that I can get feedback about how efficient this method is.
> The script I used: update_portindex.py . ( note: the code might not be very well written, I was just looking to get things working. Also, I am only updating ports built on '10.14_x86_64')

(It might have been easier to comment on pull request, but I noticed
that those commits did not make it to the pull request.)

This is an interesting way which should mostly work, just not always
and not super reliably.

The drawbacks may include:
- some ports will be skipped on the builder, for various reasons (port
is known not to build on a particular builder, it may not be
distributable, ...)
- the buildbot master may be down or experience problems, so data
might go missing

A strange observation from your source code: you synced portindex and
ran the conversion, but then loaded the data from another json file?
Am I missing something?

There are various ways to achieve the goal. Note that if you run
portindex yourself, it will detect which files have been updated and
only ever touch data of those ports. The portindex command could be
modified to only output the file with changes (when you pass some
options to it). This will still miss deletes, but it would be an
efficient way with almost no dependencies.

One way would be to generate portindex yourself and always remember
what git shasum has been used, and store that shasum to the database.
Next time when you update, check and store the latest shasum, then ask
git which paths have changed between the two commits, and only update
ports whose paths match the paths reported by git as changed.

It could also help if you stored a "complete" git history to the
database (shasum, which ports changed at that point, timestamp,
parents). Not sure if that's really so helpful, just as an option.

What might be an interesting approach would be to try to squeeze the
git shasum to the PortIndex. This could also help when submitting
statistics as it would be easier to determine how old the database is
/ when the user last synced. (It would not work for people with their
own modifications of the tree.) If you had the shasum in portindex,
you could still run git independently to check for the difference.

You could keep full portindex in git after you sync it and check the
diffs. (Not sure if it would be super trivial to figure out which
ports changed, probably not.)

Just some random ideas.


Regarding updates of builds: just ask the database about which build
you synced last, and then sync any builds newer than that, up to the
last one. You may need to check whether a build was complete when you
last enquired.

Mojca


More information about the macports-dev mailing list