Gsoc 18 Project | Collect build statistics

Mojca Miklavec mojca at macports.org
Mon Mar 26 11:53:38 UTC 2018


Dear Vishnu,

First some general remarks, then the answers to your questions below.

- One of the most important things that we should discuss and improve
is the timeline. This is very little work, but it requires some
coordination to make sure that they are defined in the most sensible
way. One option is to discuss over IRC, but email is also fine. I'll
write about that separately.

- I suggest to write the milestones in a separate section. For example
something along those lines:
  May X: The first version of database design is ready, finish the
first import of portindex.
  May Y: The prototype website gets deployed at a temporary location
xyz.com. It is possible to show the most basic properties of any port
"foo" on a plain html page via xyz.com/index/port/foo and list all
ports via xyz.com/index/list
  ...
  June Z: The website is accepting installation statistics submissions
from users.

- At the moment your schedule says "June 11-15: Phase 1 Evaluation"
and no plans to code during those days and no explicit criteria that
would define whether the project passed or not. It makes sense to put
evaluation on the schedule, so that you see where it is, but note that
this doesn't take any of your time, that's when mentors are busy and
that's when you need to make sure to show results that were agreed on.
(It can happen that there are valid reasons why the milestones would
not be met, for example if the plan gets changed during the coding
since a better solution or idea is found etc., so the milestones are
not set in stone, but in case of disagreement between a student and
mentor it helps a lot if it's straightforward to check for outsiders
or admin as well.)
I didn't check if those days are weekend, but don't make it sound like
you would be spending all your time on evaluation. You only need to
fill in a simple webform some time during the evaluation period,
that's all.

- Please do take a look at how relational databases work and get
familiar with some basic Object-oriented programming in case you don't
know those concepts already. This is of crucial importance for success
of the project. In case you would work in Django (this is not the only
option, there are other possibilities as well), you should be familiar
with
    https://docs.djangoproject.com/en/2.0/intro/tutorial02/
(this is part of a slightly longer tutorial). We need a single uniform
website rather than something scraped from random chunks of scripts
which do individual tasks without knowing about each other.

- After the proposal gets submitted (because that's the highest
priority task at the moment), we would still want to verify your
skills with a simple coding challenge before we request slots (so
ideally the challenge should be complete cca. one week after the
deadline for proposal submission). Other organisations would ask for a
pull request, but in this case it probably makes more sense to
demonstrate a simple prototype. You may pick a slightly different
task, but it should demonstrate comparable skillset or ability to
learn. My suggestion would be to do these two tasks:
(a) Create a simple (static) website for one single port of your
choice and include nearly everything that will be part of the final
product. You can copy some data from a Portfile to help you, but you
can just as well make up the information, statistics etc. I would
actually consider this to be part of the proposal. The page should
demonstrate how the final product will look like (only with fake data)
and will also help you while developing the actual functionality of
the website: you would know where is it that you are heading. This
should not take you more than a few hours to do, you basically need to
visualize what should already be in the proposal.
(b) Pick a framework to use and create a super simple hello-port
application. In case you pick Django, go to
    https://docs.djangoproject.com/en/2.0/intro/tutorial01/
or
    https://devcenter.heroku.com/articles/getting-started-with-python#introduction
or any other tutorial of your choice, install the framework and the
database, follow the steps towards creating a simple app and modify
the app in such a way that it will implement *any* super simple
functionality from the final application. You could for example import
three ports, create a listing of those three ports with symlink to the
port page which will do nothing else but print port name and
description. You may ask for help in case you are struggling with
installation of dependencies.
(c - fully optional) If you still have time and motivation, try to
make the page from (a) look at nice with basic style (you could use
any existing framework for fontend development or ask a friend to help
you with this particular one :)

On 26 March 2018 at 11:39, Vishnu wrote:
>
> Can you help me with where does that json api work from?where is the code
> that sends request to the backend.To get that data.

You mean in the planned website or on our buildbot setup?

For the planned website you would need to write it yourself.

For our buildbot setup I honestly have absolutely no idea. The source
code is here if you want to check:
    https://github.com/buildbot/buildbot/tree/eight

> Once i get to know that it will simplify my job to send request to backend
> for port

But why would you talk to backend directly? When I said that there is
a JSON api, it means that you simply fetch a JSON file from the server
directly, no need to talk to backend, that's what API is made for.

OK, one way to solve this (if you really needed super efficiency)
would be to copy the complete database and all the additional log
files to your computer and do the analysis straight on the database,
but I don't think that's worth the effort.

> Also is there any existing method to show all ports of a maintainer?
> or to show all ports of a category?say python?

There is a port command that can list you all the ports from a
particular maintainer or all ports from a particular category, but it
would be way too inefficient to use that for the website. You need to
support that in the website app.

Once you read the 15 MB ports.json file (created with portindex2json),
you'll have all the information about who maintains a particular port
and what categories the port belongs to and you need to store that
information. You cannot afford to read the file and iterate through
all entries to check whether maintainer A maintains that port every
single time when someone loads the website.

Asking `port` to retrieve that information for you will not only be
equally slow, but it will also make the website difficult to deploy.

> It would be best if i start working gmy gsoc with 3 most important things .
> 1) Basic Port information through portindex2json

Yes, this is probably the most important piece of information that
will lay foundation for everything else.

> 2) Port Build history. It would use JSON API

OK. But just to make sure that we are on the same page: you should use
the JSON api to retrieve information from the build master *once* per
build, not each time when someone takes a look at your website. You
should then store the results internally.

> 3) Buildbot code updation to send updates to the logs.

I would say that for the time being, if you implement nr. 2 anyway,
you could check the buildbot master for any updates and use the same
JSON api to, say, update the status once per hour or every 10 minutes.

As I already said, I would put a higher priority to statistics
collection. You should be able to accept and store installation
statistics submissions by the first milestone, else there will be no
time to improve the code based on all the problems you might find in
the process. Point 3 (buildbot code update) is merely an
implementation detail to allow more efficient collection of build
statistics and can be implemented later.

> And as you said logs would be ineffective to store .For history of builds of
> the port.I think logfile would be the bestway to store history.

I still cannot imagine an efficient implementation of a scalable
website with lots of information to be based on data stored in plain
text files.

You REALLY REALLY should be looking at Object-oriented design of the
software and have a closer look at relational databases and try to
understand the basic concepts there (tables, rows, primary index,
foreign keys etc).

While you can certainly hack something quickly with plain text files,
the solution will not scale well.

I would suggest you to include the database layout in your proposal:
how will you store the information you need for the described
functionality of the website.

> And regarding the syntax . I was thinking of storing the build history in
> json format in one line in log.txt of a particular port.
>
> Also mojca can i know your working hours.or atleast the timezone where you
> live.

I'm at UTC+2. Working hours are hard to define precisely, but I try to
sleep at night :)
Umesh is at the same time zone as you are and knows enough details to
be able to help you (if he has time).

Mojca


More information about the macports-dev mailing list