Gsoc 18 Project | Collect build statistics

Vishnu vishnum1998 at gmail.com
Tue Apr 3 15:47:20 UTC 2018


Hello,

First thing. I am. Not clear about what document you are talking about
perfecting.

And i will now focus attention on django as mentioned in my earlier mails.

And thanks a lot For that in depth reply. Will work on all things related
to Html after I complete working with django.

Thanks

On Tue, Apr 3, 2018, 9:04 PM Mojca Miklavec <mojca at macports.org> wrote:

> Dear Vishnu,
>
> Thank you very much for sharing the document. The purpose of this HTML
> was two-fold:
> - demonstrating your skills
> - first step of the planning phase for the actual implementation
>
> Below I'm providing some feedback, but I would suggest to concentrate
> on a simple django app at this moment and then return back to this
> html once you are "done" with Django to address (some of) the
> comments. In short: if selected, I'll insist to make this document
> "perfect" before proceeding (and to address all the feedback + more I
> didn't yet bother writing), but there's no point in asking you to
> spend a week making this document ten times longer and fixing tiny
> unimportant details that don't really demonstrate the skillset :)
>
> On 2 April 2018 at 23:32, Mojca Miklavec wrote:
> > V pon., 2. apr. 2018 19:49 je oseba Vishnu napisala:
> >>
> >> In the database.
> >> Because then it would be very easy to count the number of os for that
> >> port.
> >
> > I'll explain tomorrow why this is suboptimal. (But there's no need to
> > further optimise the database design right now.)
>
> There are probably better resources that explain this, but here's the
> first hit from Google:
>
> https://en.wikipedia.org/wiki/Don%27t_repeat_yourself
> https://en.wikipedia.org/wiki/Database_normalization
>
> In extreme case, imagine that we decide to send a questionnaire to our
> participants of statistics collection, asking them some 100 optional
> questions, including anything from gender, age, country of origin,
> country of current residence, education, favourite animal, ... Then we
> decide that we would want to compare the age distribution of users of
> package A vs. age distribution of users of package B.
>
> Your idea that allows "very easy number counting" would mean that:
>
> At the moment you only have (submission id, port, port version,
> variants) in the table. You would need to extend the table to contain
>     (submission id, submission time, user id, port, port version,
> variants, os version, stdlib, xcode version, age, gender, country,
> education, favourite animal, ...)
> And if the user has 1000 ports installed, you would need to store
> 100x1000 cells (repeat that same information one thousand times and
> then again in any subsequent submission from the same user) instead of
> having a single copy in a separate "questionnaire" table. Multiply
> that with 10.000 users submitting statistics and you end up with tens
> of gigabytes of data each month, just to store results of that
> one-time questionnaire.
>
> On top of that, once the user submits a questionnaire, if you keep
> those answers in a separate table and use proper SQL queries, you
> could easily get the answer to question "what was the prevailing
> gender of users of package A" even for submissions that were made many
> months ago. If you store everything into a single monstrous table, you
> would either need to modify plenty of old submissions or you would not
> be able to get that information for old submissions at all.
>
> Additionally, it could happen that while you are updating old
> submissions, the database crashes. You could end up with half of the
> entries updated and the other half left at their old value, in
> inconsistent state. There are plenty of problems if you don't make
> sure that you keep your database design in a good shape from the very
> beginning.
>
>
> That's a super common use case in databases that has already been
> solved. One should use table joins and views. Random link (I'm sure
> there are better ones):
>     https://db.grussell.org/sql3.html
>
> I don't know how Django handles joins and views (some hints I skimmed
> through are here https://stackoverflow.com/a/1281051/585897), but one
> should certainly make sure that the database design is done well.
> Learning more about that topic is part of the process.
>
>
> On 2 April 2018 at 23:50, Vishnu wrote:
> >
> > Please go through this https://jsfiddle.net/vishnum98/3r4vL4L3/21/
> >
> > I did some changes.
>
> Thank you very much. The chart looks ok. For the remaining (missing)
> charts just add a section (and optionally an empty box) and describe
> what kind of chart goes there (no need for a long paragraph, just make
> it clear what's on the Y axis).
>
> I don't think we need a drop-down to select a version, but now that
> you put it there, what I think would be helpful to have there is
> something to switch between:
> - absolute number of installations in that month
> - number of installations of that port divided by total number of
> submissions in that month
> That is: having both absolute and relative numbers available.
>
> To make it clear: don't bother actually implementing this now. You can
> add a placeholder to remind you about that later (or just change the
> contents of that drop-down to do this instead), nothing else.
>
> We are mainly interested in the cumulative number of installations of
> a particular. Version does tell something, but not *that* much, except
> that the user did not update the ports for at least a month. We could
> potentially make a cumulative diagram listing all versions, random
> example:
>
> https://kanbanize.com/blog/wp-content/uploads/2014/01/Cumulativeflowfinal.png
> but I would worry about that *at the very end*.
>
> What would be a much better *global* measure would be the time since
> the user last updated PortIndex, but I have no clue how to get that
> information in a reliable way (and it's certainly not your task to
> worry about it).
>
>
> Further comments:
>
> * Some more items from the proposal are still missing, like whether
> the package is outdated, latest commits, link to tickets, ... No need
> to do anything fancy, just put some placeholder there.
>
> * Build statistics will need more work. I mean: the table as it is
> looks nice. But we'll probably want to represent the information in
> two different ways. One way listing all builds the way you did now.
> And the other one in approximately this way:
>
> https://trac.macports.org/ticket/55978#Viewnr.3:Overviewofhistoryofbuildsofaparticularport
>
> * I'll save more nitpicking for later :)
>
> Mojca
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.macports.org/pipermail/macports-dev/attachments/20180403/e56f46f1/attachment.html>


More information about the macports-dev mailing list