GSoC 2019 [Collect build statistics]

Arjun Salyan arjun.salyan.che17 at itbhu.ac.in
Thu Mar 28 21:11:40 UTC 2019


On Thu, Mar 28, 2019 at 12:05 PM Mojca Miklavec <mojca at macports.org> wrote:

> What if there's a server outage?
>

Then the best way is to use HttpStatusPush to deliver instant updates, and
so that any build is not missed due to server failure, we could run our
fetching script once per day. The script can easily match if any of the
build number present in logs is absent from the database.


> (3) The database needs to be designed in such a way (and the software
> needs to be written in such a way) that frequent updates of the full
> portindex2json:
>     (a) works correctly (ports missing from PortIndex are marked as
> gone, no duplicate entries of ports, all info up-to-date)
>     (b) works super efficiently
>     (c) works with minimal overhead
> If network speed is the bottleneck, make sure that you feed / update
> the database from the same machine where the database is running.
> Updating via git is super fast, you want to avoid transferring the
> full 20MB file over network over and over again. Even if the testing
> system is running at strange configurations, suggest the architecture
> of how it would ideally be implemented if you can design the system
> and architecture yourself.
>
For keeping an updated copy of portindex.json this seems a fine pathway:

   - Generate portindex.json file along with Portindex, i.e. run
   portindex2json.tcl on our own. [ this would also help in our discussion
   with repology ]
   - portindex.json can be stored in the same directory as PortIndex and if
   we run our web-app on a different machine [ which is the most probable case
   ] then we could keep web-app's version of portindex.json updated using
   rsync [ repology is doing the same, not sure though ].
   - Then using os.stat on web-app's version of portindex.json, we can
   continuously check the file's 'last modified' time and can hence, can
   detect if there are any changes.

Now as we have an updated copy of portindex.json, we go back to our build
history which is constantly receiving updates from the server [ without
delay, if everything is fine and with some delay in case of server outage ]
and detect which ports had been recently built, and for those ports we
would then update the database using portindex.json.
To ensure things remain in right manner, we can schedule a weekly 'complete
syncing of database and portindex.json'.



> (4) Suggest a way to minimize the data transfer, so that it will only
> include the changes rather than the full data set. How to get such
> data? What would need to be changed / improved?


rsync would do exactly this.

(5) You won't be getting port renames. What you do get is
> "replaced_by" information at best (say, perl5.26 could be replaced_by
> perl5.28). When a port is renamed, treat it as a different port, but
> the old port could be marked as "inactive" and "replaced_by <which
> port>" (if it's not deleted yet). This information is probably not in
> PortIndex, either portindex would need to be improved, or you need to
> find a different way.
>

Okay! So the name change problem can be handled. We can have a column of
"replaced_by" in out table and as long as it is empty/ NULL -> the port is
active else it is inactive and has been replaced by a new port.

Please let me know if these approaches look fine.

Thank You
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.macports.org/pipermail/macports-dev/attachments/20190329/8def55e0/attachment-0001.html>


More information about the macports-dev mailing list