Gsoc 18 Project | Collect build statistics
Mojca Miklavec
mojca at macports.org
Mon May 14 05:35:07 UTC 2018
Dear Vishnu,
On 13 May 2018 at 09:37, Vishnu wrote:
> Hey
>
> Mojca you said once that you have lot to comment on the current database.
> So it would be great if you could do that.
I left a comment on
https://github.com/macports/macports-webapp/issues/2
and made a bunch of changes and suggestions to
https://github.com/macports/macports-webapp/blob/master/docs/Database_Design.md
(I would still like to go through installation statistics and build
statistics in slightly more detail.)
Since the idea was to spend the whole week heavily concentrated on the
database design and implementation (which also includes planning,
documenting, changing, testing etc.), my suggestion would be to do the
following (but feel free to adapt it):
(0) Given the limitation in heroku, what you can do is take
approximately 1000 ports for the initial tests. A quick trick could be
to loop through the file with something like "while
lowercase(first_letter_of_portname) < 'f'". No need to spend any extra
time of this, just a simple suggestion for a quick workaround. You can
of course also cut the portindex file manually for now.
(1) So far you have implemented one table with portindex. Before you
end up implementing the full database scheme from the ground up, you
can probably try the first and the easiest many-to-many relationship
"categories". Change the current app to add a new table "categories"
and while you are reading the database of ports:
- create or update categories as ports list them
- create entries in port_category
- the port page should list all categories to which this port belongs
- these categories should be hyperlinks to a page listing all ports
belonging to that particular category
so basically something similar to what https://www.macports.org/ports.php does.
(2) I would strongly suggest to try to figure out how to do unit tests
and start writing some tests. A random URL:
https://realpython.com/testing-in-django-part-1-best-practices-and-examples/
but don't necessarily stick to this URL/suggestion, it's just one of
the first few hits, you might find better resources elsewhere.
What would be nice to test at this stage is for example the following scenario:
- prepare a simple, stripped-down version of portindex.json (maybe
just 5 ports or so)
- import them to a clean database (you should have a different
database for running unit tests than "production" code)
- check that you have the correct number of entries in all tables,
check for some values that are potentially tricky to get right
- prepare another simple portindex.json with one new port, one deleted
port, one updated port, apply that one and make sure that you are
still getting the expected number of entries in all tables, and that
what you are getting is still correct
(3) I didn't check the recent changes in details yet, but you should
make sure that the application keeps working properly when you read
portindex.json multiple times. Last time the database would create a
new entry for the same port. If that port exists, you should update
the entry instead of creating a new one (there's a function
create_or_update, I think).
Of course feel free to continue implementing tables from the design
document, but instead of rushing into creation of all tables at once,
it make most sense to test them incrementally and have a good coverage
with unit tests to avoid breaking stuff as you keep developing.
Since you started working on portindex anyway, it's probably a lot
more important to have a well implemented and tested this part of
database (+ of course having the database design document ready for
further steps) than to rush into setting up all the tables at once.
See also
https://github.com/umeshksingla/macports-stats
which has been created during / immediately after the MacPorts meeting
in Slovenia two months ago as we were brainstorming on this idea. It's
just a quick proof-of-concept repository, but you might find some
ideas there.
Mojca
More information about the macports-dev
mailing list