Gsoc 18 Project | Collect build statistics

Mojca Miklavec mojca at macports.org
Fri May 11 07:17:36 UTC 2018


On 11 May 2018 at 04:00, Vishnu wrote:
> Hi
>
> Also i wanted to know
> For this link
> https://www.macports.org/ports.php?by=name&substr=python27
>
> There is already existing database with loads of information.
> Is it not updated?
> is  it static ?

It is not static, the database is updated via an external script from
PortIndex and the page is rendered with php code, reading information
from the database.

> If it is updated automatically...Then half of the work for database would
> already be over.
>
> Could i get access to the DB?

Once your application reaches a certain quality & completeness, it
should render the above page obsolete and the above page will most
likely be removed.

While we could theoretically give you access to the database (not
without opening some security holes): first of all this hardly makes
any sense because you should always be able to reproduce the exact
state of the database if you run PortIndex2PGSQL.tcl yourself, but
most importantly we don't want to introduce a complex network of
interdependencies.

Let's say that I need to change the new app a few months from now.
Before I can deploy the change I need to test it locally on my
machine. If you app will require password-protected or ip-limited
access to some third-party database, this would greatly hinder my
ability to patch the new app. The app should be as standalone as
possible. The next very important aspect is that you might want to
include more information than what's currently available in that
database. For example, you might decide to include the information
about which port supports which version of macOS. That information is
not available in the existing database, so you would then first need
to patch who-knows-what-other-code in random mixture of tcl, php,
bash, ... And you would not be able to do table joins etc. It's
absolutely bad idea to depend on third-party database. You should of
course collect information from various sources, but you need to keep
all the required data in a single database somewhere close to your
app.

Some time ago you had ideas about updating portindex just once per
year? Well, here's an example why you should ideally do it after each
commit rather than once per year. That said, we should ideally find a
way to optimise the PortIndex2json part (probably not your task).
PortIndex already updates just those ports that have in fact been
changed, but conversion to json updates the complete file and needs to
feed the web app with full 15 MB of data after each commit. This is
quite a bit suboptimal.

> I got the rough idea of the db from here:
> https://github.com/macports/macports-infrastructure/blob/2129f0cd0eb80f207d2cc62542b65c197733ac51/jobs/PortIndex2PGSQL.tcl#L249

Sure, you can take this as inspiration for database design, but not as
your definite source of data. Your definite source of data should be
PortIndex from the latest git checkout of macports-ports.

> So is this updated regularly?How?

I'm not absolutely sure, but I believe the changes are deployed after
each commit via
    https://build.macports.org/builders/jobs-portindex

> And there seems to be lot of data already existing. Which i could just copy.
> Existing maintainer table,many other.

No. You should replicate that in your app.

You need a separate glue table to connect maintainers with ports. But
that's already part of your database design. You should not store
maintainers in you port table, but perform correct joins (or maybe
create a view first?) instead.

> On 11 May 2018 at 07:11, Vishnu <vishnum1998 at gmail.com> wrote:
>>
>> Hi
>>
>> So should I start working on the same database?
>>
>> Community does not have any suggestions I guess.
>> So should i go ahead with the existing structure?

*I* do have lots of comments even if nobody else does. One of the
problems is that the excell table and your md documents were not in
sync last time I checked (some tables were missing etc.)

While I'm working on them, can you please:

- Update ticket descriptions (some of them were empty when I checked,
some would say "Implement this feature" without actually telling
anything else). Super short descriptions are in principle ok, but only
if we have an additional document describing what will actually be
done. You already spent quite some time writing your proposal, but the
proposal is only visible to GSOC mentors. Can we put your proposal
(probably in markdown?) next to database design document? The idea is
that anyone looking at the repository should have an understanding of
what will be done (and could in principle pick it up himself), but we
might want to update the document during the summer. While you might
know that feature X should be implemented, maybe you don't know yet
how to implement it and before you spend a lot of time on coding
something that would be a bad idea and would need to be dropped
because it would be too inefficient or too insecure, it makes sense to
have a very clear document of what will be done.

- Remove fake build statistics from the dynamic app and instead
complete the "static sample page" (or pages) that you started to work
on during your coding challenge, and commit it somewhere to the
repository. You can/should keep the fake data there, but also complete
the page, at least with the info about *what* should be there. You had
just one graph there. You don't need to actually implement lots of
graphs, but you need to create placeholder (say, just a title and
description about what the graph to come there should show).

>> I had a doubt regarding version.
>>
>> How can i check the existing versions of any port.
>> Say python27?
>>
>> Any list of maintainers i can get?
>>
>> Thanks
>>
>> On 9 May 2018 at 04:06, Vishnu <vishnum1998 at gmail.com> wrote:
>>>
>>> Hi
>>>
>>> And also i couldn't figure out any way to hide passwords/ Sensitive
>>> information while creating app.

Here's one way:
    https://ultimatedjango.com/learn-django/lessons/handling-sensitive-keys/

Of course you need that information on the server where you are
running the application, but the secrets and passwords should not be
stored in a public repository.
What I often do is create something like
    settings.py.sample
or perhaps just
    secrets.py.sample
and commit that one with a fake password to repository. Then, whoever
wants to run the app, should first copy the file (removing the .sample
extension), enter the correct secret data and only then run the app.

Again: you do need to have this information stored somewhere, it just
may not leak to a public repository. If you commit settings.py with
fake passwords and correct the password on that one file, you might
accidentally commit the change one day, so it's better to have a
separate file.

>>> On 9 May 2018 at 03:43, Vishnu <vishnum1998 at gmail.com> wrote:
>>>>
>>>> I had one doubt.
>>>> Should i switch the link in heroku account for integration with macports
>>>> github ?

I'm not sure what is it that you are asking. If Heroku need special
priviliges on GitHub (what permissions are required?), it might be
best to create an additional user on GitHub for now. Can you provide
some pointers? It's pointless that I theoreticise about the options
before I know what is required.

>>>> Because i think then you need to give accesss to heroku of your account.
>>>>
>>>> I think it would be wise for me to do the commit update in my local
>>>> repository itself..
>>>>
>>>> Once every 2 weeks or something ill push all the changes to macports
>>>> repository.
>>>> Do comment .What should be done?

I wanted to suggest that the app would be developed either in a
separate branch or in your fork of repository (not in master branch,
please!), and then you would make pull requests on regular basis and
we would review the pull request and make sure that any code that ends
up in repository has been fully reviewed / tested.

If you commit everything (including tests) straight to the master
branch of the main repository, it's more challenging to track which
code reviews have already been taken into account and which ones were
not. You should not make giant pull requests anyway (ideally you would
make a PR on approximately daily basis or at least once per week in
case you need to figure out a number of things and code is not yet
ready; making one giant PR every two weeks or - god forbid - once per
month or only at the end of GSOC might cause too many troubles).

Mojca


More information about the macports-dev mailing list