Gsoc 18 Project | Collect build statistics

Vishnu vishnum1998 at gmail.com
Sat May 12 08:34:39 UTC 2018


Hi

I am not saying that my db and the exiting Db would be interdependent.
Rather i am saying just once in forever I can copy the content to my
database.
Then do the code to keep on updating it whenever something is added or
deleted or modified.

Like for exaple maintainers.
You already have a table of mainatainers and their ports.So i can copy the
data to my database.
Then i will make a similar to PortIndex2PGSQL.tcl script to keep on
updating my database independent of the existing database.

Again and again processing the 15mb json file would be bothersome.
We definitely need some way of getting a differential json.
To update only the changes happened because of commit to the port file.

Or maybe we could make some changes in buildbot that.Whenever there is some
change in portfile it could update my port table as well with the changes.
Not sure how to do this.But just an idea as of now.

"You need a separate glue table to connect maintainers with ports. But
that's already part of your database design. You should not store
maintainers in you port table, but perform correct joins (or maybe
create a view first?) instead."

You misunderstood me, Copying the data.I meant once in a lifetime.
Then keep on updating the table with whatever changes happens.
And i will definitely perform join or views.

I will be doing following things :
- Update Ticket descriptions
- Put the gsoc proposal in docs
- update the graph on the static website
- try to handle sensitive keys


Regarding heroku.
Don't worry about that. I got it figured out.

And regarding working on a seperate branch.Rainer suggested not to.Because
daily pull requests have to merged by the mentor.

Would directly commiting to master be a problem ?
As only i will be dealing with the repo.
Probably no one will right now clone the repository.

It is also very easy to directly commit to master than creating a PR.



One doubt I still have is that whether  portindex2postgres.tcl again and
again runs the entire MacPorts port tree on the database ?
Or just the port that has been changed ?


Thanks

On 11 May 2018 at 12:47, Mojca Miklavec <mojca at macports.org> wrote:

> On 11 May 2018 at 04:00, Vishnu wrote:
> > Hi
> >
> > Also i wanted to know
> > For this link
> > https://www.macports.org/ports.php?by=name&substr=python27
> >
> > There is already existing database with loads of information.
> > Is it not updated?
> > is  it static ?
>
> It is not static, the database is updated via an external script from
> PortIndex and the page is rendered with php code, reading information
> from the database.
>
> > If it is updated automatically...Then half of the work for database would
> > already be over.
> >
> > Could i get access to the DB?
>
> Once your application reaches a certain quality & completeness, it
> should render the above page obsolete and the above page will most
> likely be removed.
>
> While we could theoretically give you access to the database (not
> without opening some security holes): first of all this hardly makes
> any sense because you should always be able to reproduce the exact
> state of the database if you run PortIndex2PGSQL.tcl yourself, but
> most importantly we don't want to introduce a complex network of
> interdependencies.
>
> Let's say that I need to change the new app a few months from now.
> Before I can deploy the change I need to test it locally on my
> machine. If you app will require password-protected or ip-limited
> access to some third-party database, this would greatly hinder my
> ability to patch the new app. The app should be as standalone as
> possible. The next very important aspect is that you might want to
> include more information than what's currently available in that
> database. For example, you might decide to include the information
> about which port supports which version of macOS. That information is
> not available in the existing database, so you would then first need
> to patch who-knows-what-other-code in random mixture of tcl, php,
> bash, ... And you would not be able to do table joins etc. It's
> absolutely bad idea to depend on third-party database. You should of
> course collect information from various sources, but you need to keep
> all the required data in a single database somewhere close to your
> app.
>
> Some time ago you had ideas about updating portindex just once per
> year? Well, here's an example why you should ideally do it after each
> commit rather than once per year. That said, we should ideally find a
> way to optimise the PortIndex2json part (probably not your task).
> PortIndex already updates just those ports that have in fact been
> changed, but conversion to json updates the complete file and needs to
> feed the web app with full 15 MB of data after each commit. This is
> quite a bit suboptimal.
>
> > I got the rough idea of the db from here:
> > https://github.com/macports/macports-infrastructure/blob/
> 2129f0cd0eb80f207d2cc62542b65c197733ac51/jobs/PortIndex2PGSQL.tcl#L249
>
> Sure, you can take this as inspiration for database design, but not as
> your definite source of data. Your definite source of data should be
> PortIndex from the latest git checkout of macports-ports.
>
> > So is this updated regularly?How?
>
> I'm not absolutely sure, but I believe the changes are deployed after
> each commit via
>     https://build.macports.org/builders/jobs-portindex
>
> > And there seems to be lot of data already existing. Which i could just
> copy.
> > Existing maintainer table,many other.
>
> No. You should replicate that in your app.
>
> You need a separate glue table to connect maintainers with ports. But
> that's already part of your database design. You should not store
> maintainers in you port table, but perform correct joins (or maybe
> create a view first?) instead.
>
> > On 11 May 2018 at 07:11, Vishnu <vishnum1998 at gmail.com> wrote:
> >>
> >> Hi
> >>
> >> So should I start working on the same database?
> >>
> >> Community does not have any suggestions I guess.
> >> So should i go ahead with the existing structure?
>
> *I* do have lots of comments even if nobody else does. One of the
> problems is that the excell table and your md documents were not in
> sync last time I checked (some tables were missing etc.)
>
> While I'm working on them, can you please:
>
> - Update ticket descriptions (some of them were empty when I checked,
> some would say "Implement this feature" without actually telling
> anything else). Super short descriptions are in principle ok, but only
> if we have an additional document describing what will actually be
> done. You already spent quite some time writing your proposal, but the
> proposal is only visible to GSOC mentors. Can we put your proposal
> (probably in markdown?) next to database design document? The idea is
> that anyone looking at the repository should have an understanding of
> what will be done (and could in principle pick it up himself), but we
> might want to update the document during the summer. While you might
> know that feature X should be implemented, maybe you don't know yet
> how to implement it and before you spend a lot of time on coding
> something that would be a bad idea and would need to be dropped
> because it would be too inefficient or too insecure, it makes sense to
> have a very clear document of what will be done.
>
> - Remove fake build statistics from the dynamic app and instead
> complete the "static sample page" (or pages) that you started to work
> on during your coding challenge, and commit it somewhere to the
> repository. You can/should keep the fake data there, but also complete
> the page, at least with the info about *what* should be there. You had
> just one graph there. You don't need to actually implement lots of
> graphs, but you need to create placeholder (say, just a title and
> description about what the graph to come there should show).
>
> >> I had a doubt regarding version.
> >>
> >> How can i check the existing versions of any port.
> >> Say python27?
> >>
> >> Any list of maintainers i can get?
> >>
> >> Thanks
> >>
> >> On 9 May 2018 at 04:06, Vishnu <vishnum1998 at gmail.com> wrote:
> >>>
> >>> Hi
> >>>
> >>> And also i couldn't figure out any way to hide passwords/ Sensitive
> >>> information while creating app.
>
> Here's one way:
>     https://ultimatedjango.com/learn-django/lessons/handling-
> sensitive-keys/
>
> Of course you need that information on the server where you are
> running the application, but the secrets and passwords should not be
> stored in a public repository.
> What I often do is create something like
>     settings.py.sample
> or perhaps just
>     secrets.py.sample
> and commit that one with a fake password to repository. Then, whoever
> wants to run the app, should first copy the file (removing the .sample
> extension), enter the correct secret data and only then run the app.
>
> Again: you do need to have this information stored somewhere, it just
> may not leak to a public repository. If you commit settings.py with
> fake passwords and correct the password on that one file, you might
> accidentally commit the change one day, so it's better to have a
> separate file.
>
> >>> On 9 May 2018 at 03:43, Vishnu <vishnum1998 at gmail.com> wrote:
> >>>>
> >>>> I had one doubt.
> >>>> Should i switch the link in heroku account for integration with
> macports
> >>>> github ?
>
> I'm not sure what is it that you are asking. If Heroku need special
> priviliges on GitHub (what permissions are required?), it might be
> best to create an additional user on GitHub for now. Can you provide
> some pointers? It's pointless that I theoreticise about the options
> before I know what is required.
>
> >>>> Because i think then you need to give accesss to heroku of your
> account.
> >>>>
> >>>> I think it would be wise for me to do the commit update in my local
> >>>> repository itself..
> >>>>
> >>>> Once every 2 weeks or something ill push all the changes to macports
> >>>> repository.
> >>>> Do comment .What should be done?
>
> I wanted to suggest that the app would be developed either in a
> separate branch or in your fork of repository (not in master branch,
> please!), and then you would make pull requests on regular basis and
> we would review the pull request and make sure that any code that ends
> up in repository has been fully reviewed / tested.
>
> If you commit everything (including tests) straight to the master
> branch of the main repository, it's more challenging to track which
> code reviews have already been taken into account and which ones were
> not. You should not make giant pull requests anyway (ideally you would
> make a PR on approximately daily basis or at least once per week in
> case you need to figure out a number of things and code is not yet
> ready; making one giant PR every two weeks or - god forbid - once per
> month or only at the end of GSOC might cause too many troubles).
>
> Mojca
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.macports.org/pipermail/macports-dev/attachments/20180512/07bb92d9/attachment-0001.html>


More information about the macports-dev mailing list