Gsoc 18 Project | Collect build statistics

Mojca Miklavec mojca at macports.org
Fri Mar 23 05:26:05 UTC 2018


Dear Vishnu,

Thank you very much for reaching to us.

Just one general remark since the deadlines are approaching very
quickly: a very important part of our GSOC selection process is to
prove to us that you can do a particular task. This may include some
pull requests, small demo prototype that you could show us (that might
be most suitable it this case?), some references to your past work,
... You should provide such information & some sample prototypes as
soon as possible (but it can be a few days after proposal submission
deadline since writing a good proposal is the top priority right now).
Also super important: you should provide a draft proposal at least a
few days before the deadline (it doesn't necessarily have to be
submitted to the GSOC website, it's totally up to you, it can be
anywhere where we could read it). You should absolutely not submit
your first and final proposal one hour before the deadline. I would
have generally suggested to submit it 2 weeks before the deadline, but
that's no longer possible.

I'm writing this email in a slightly longer form that usual because
some of these ideas were clarified only last week and it makes sense
to get feedback from a broader community. So I'm not writing it
exclusively for you :) :) :)


On 22 March 2018 at 20:25, Vishnu wrote:
> Hello all,
>
> Thanks for rectifying the mail issues.
>
> This is Vishnu.
> I would Like to participate in GSOC 18 .And would like to work on :
>
> #Collect build statistics
>
> I needed some help in understanding the idea a bit more.
> i wanted to understand the idea more deeply.
> Can someone elaborate the idea more?

This might need some further explanation indeed, in particular because
we had a developer meeting last week and the ideas crystalised a bit
and we now have a slightly better understanding of that we want to
achieve.

This is our current overview of the package builds:
    https://build.macports.org/waterfall
The problem is that once the build has finished (the counter is at
roughly 60.000 per builder, sadly only the last 10.000 are kept), we
have absolutely no overview on per-port and per-os basis about whether
or not a particular build of that port succeeded (perhaps a month or a
year ago). The best we can do is check
    http://packages.macports.org/python27/
and see whether version X for macOS version Y exists or not, but
that's clumsy, if nothing else because a package may not exist there
just because of licence conflict that prevents binary distribution.
Also, we have absolutely no overview of which ports built in version
X, but then failed in version Y etc.


First of all: just the "collect build statistics" part on its own is
slightly too short on its own to fill the whole summer. I hope others
will chime it with their view, but given your interests and skillset I
would suggest to think in either one of two directions proposed below
(or a bit of both).


Option A) Extend buildbot and create some views (requires Python,
JavaScript etc.)

This might include some modifications of the core buildbot code (which
is all Python), but more important part would be to write dedicated
views for buildbot version 1.x.

We are currently using buildbot 0.8, but would eventually like to
switch to version 1.x. The main obstacle is that it lacks a view that
would be informative enough for us. (Since we are slightly "abusing"
buildbot in ways different than it was designed for, most of that info
is also lacking now in version 0.8.)

Here's a bit of background written by Pierre:

On 16 February 2018 at 14:02, Pierre Tardy wrote:
>
> Other people are also frustrated by the nine version of the waterfall,
> including the webkit people https://github.com/buildbot/buildbot/issues/3884
> I added lengthy explaination in this issue on why it is like that and why it
> is not easy to improve. But help is appreciated.
>
> My suggestion for webkit is also valid for macports. What you may need is a
> new UI plugin, which is carefully crafted for your requirements.
>
> I think this is a very good match for a GSOC project.
> I think this can be done by recruiting a front-end student.
> The project might need to develop some python code in order to speed-up the
> queries (like the one to matches every builds associated to a change
> https://github.com/buildbot/buildbot/issues/3927 ). This is something that I
> could help with.
...
> The goal of buildbot is to allow people with reasonable web frontend background
> to build their custom dashboard in 2 or 3 days.
> Each project does not require that, this is why we try to have some generic
> versions.
...
> The examples we have are the console/grid/waterfall plugins.

These are some proposals for views that I would for example like to
see *inside* the buildbot:
   https://trac.macports.org/ticket/55978

If the main stress would be on this idea, it would be co-mentored from buildbot.


Option B) Create package index (which includes build info)

We desperately need a package index. Some examples:
- http://brewformulas.org/Python
- http://braumeister.org/formula/python

What we have now is merely:
- https://www.macports.org/ports.php?by=name&substr=python
and an old GSOC project for statistics that would need to be improved
(but it might be easier to rewrite it) and properly deployed:
- http://stats.macports.neverpanic.de/categories/11/ports/18576
There's also an undeployed rails application under:
- https://github.com/macports/macports-contrib/tree/master/mpwa/doc

During our developer meeting a few days ago Aljaž wrote a tool in a
couple of hours that creates some nice static pages for all ports:
    https://github.com/g5pw/macports-port-tree
(a sample output is in attachment, but Aljaž also incorporated some
build statistics for which I don't have an example here)
and Umesh started experimenting with dynamic pages in django 2 (we
have code for that too, but it's still in super preliminary phase).

What would currently make most sense for us would be a single dynamic
website (django as a framework sounds reasonable, even though not the
only one "allowed"; we would definitely like to avoid rails since that
requires a lot of maintenance to keep it up to date and running and
the latest version and there's a lot of magic going on that's
difficult to understand without mileage) with one page per port which
would combine:

- all the information about the port (description, version,
maintainer, homepage, variants, dependencies, dependent ports, ...)
- build summary (as scraped from buildbot history): which version was
built at what time on which os, successful or not, ...
  - including whether or not a binary package exists
- installation statistics (from users who opted in and send a json
file every now and then) with some graphs, similar to the stats page
above
- results of "livecheck": whether there's a newer version of the port available
- short git log with links to the last few changes to the Portfile
- links to trac tickets for that port
  https://trac.macports.org/query?0_port=python27&0_port_mode=%7E&0_status=%21closed
- (maybe links to pull requests for that port)

We would also need (must easier to do) pages for maintainers (listing
all the ports they maintain, including the info which ports are
outdated), categories (listing all the ports in category), search etc.

> How many web pages do we need to make ?

Ideally just one framework that will automatically generate all tens
of thousand of pages for various ports :)

Basically we need to be able to:
- accept submissions of port usage statistics
- accept success/failure reports from the buildbot
- accept updates when Portfiles change, when ports are deleted etc.
- optionally accept build error reports from users who opt in (that
part would still need to be written on the Tcl side)
- show one page per port
- show pages for categories, maintainers

That said ... our complete project website is from stone age, so any
changes there would also be welcome :)

> Can i get some sample structure of webpage for 1 port . what all need to be
> there.

I hope it's described well enough above, but feel free to ask for
additional questions.

> Also where to get : collect per-port statistics & success matrix .Can i get
> the exact sample link?

This would have to combine two separate steps:
- Step 1: scrape old build statistics, see https://build.macports.org/json/help
- Step 2: modify buildbot setup to automatically send the build report
(in json) to the new website

This is our current buildbot setup:
    https://github.com/macports/macports-infrastructure/blob/master/buildbot/master.cfg

A proof-of-concept for step 1 has in fact already been written during
the developer meeting last week and should not need more than a day or
two to complete + probably a few days to run to fetch the data from
the server. We would probably need to add some more info to buildbot
setup (for example, it's currently pretty cumbersome to extra the
version of the package being built).

> I really think this is the project i could work on because i have prior
> experience with all the languages required for this project (HTML, Python,
> javascript, in fact, did my last intern mainly involving JSON).
> I also have good experience in web Scraping.
>
> I am well aware about :
> JSON
> Html
> Css
> Python
> JAVASCRIPT
> Basic idea about perl
>
> Have used macOS.
> Access to macos during gsoc would be limited. But possible.
> I have certain knowledge of macports too.

If working on buildbot, you should be able to set up a working
buildbot installation on your machine, both for version 0.8 and 1.x.
You didn't mention which is your main platform. I don't know if that
works on Windows (it might), but Windows might be a lot more tricky,
so I hope you at least have full access to Linux. That would help in
the second idea as well.

MacPorts base should also build on Linux (if it doesn't we can
probably fix it). Most packages won't install on a linux box, but it
should be enough to run various commands that would be needed to set
up various parts of the website.

Mojca


PS: here's an example of what current statistics website gets from
someone who opts in for statistics submissions:
{
  "id": "some-unique-id",
  "os": {
    "macports_version": "2.4.2",
    "osx_version": "10.13",
    "os_arch": "i386",
    "os_platform": "darwin",
    "build_arch": "x86_64",
    "xcode_version": "9.2"
  },
  "active_ports": [
    {"name": "python2_select", "version": "0.0_2"},
    {"name": "python_select", "version": "0.3_7"},
    {"name": "llvm_select", "version": "2_0"},
    {"name": "cctools", "version": "895_4", "variants": "llvm40 +"},
    {"name": "xorg-xcb-proto", "version": "1.12_1", "variants": "python27 +"},
    {"name": "SuiteSparse", "version": "4.2.1_4", "variants": "accelerate +"},
    {"name": "wxPython-3.0", "version": "3.0.2_5", "requested": "true"},
    {"name": "p5.26-wx", "version": "0.993.200_0", "requested": "true"}
  ]
}


More information about the macports-dev mailing list