Gsoc 18 Project | Collect build statistics

Mojca Miklavec mojca at macports.org
Sat May 12 15:27:14 UTC 2018


On 12 May 2018 at 14:17, Rainer Müller wrote:
> On 2018-05-12 10:34, Vishnu wrote:
>> I am not saying that my db and the exiting Db would be interdependent.
>> Rather i am saying just once in forever I can copy the content to my
>> database.
>> Then do the code to keep on updating it whenever something is added or
>> deleted or modified.
>>
>> Like for exaple maintainers.
>> You already have a table of mainatainers and their ports. So i can copy
>> the data to my database.

No, you should get that data from portindex.json, rather than from
copying it from another database.

>> Then i will make a similar to PortIndex2PGSQL.tcl script to keep on
>> updating my database independent of the existing database.
>
> Our database contains exactly what the portindex2postgres.tcl script
> produces (note the different naming). You can just generate the data
> locally by running it against PortIndex.
>
> You need to have the ports tree in a directory named "ports", then run
> the following command from the parent directory:
>
>   port-tclsh /path/to/macports-infra/jobs/portindex2postgres.tcl
>
> This will create a PortIndex.sql in the current directory. Ignore or
> remove the macports.conf and sources.conf files that are also created.

But the information is also nearly identical to what one gets with
portindex2json.tcl, and if there are any issues (there are some), I
would say that we need to fix portindex2json instead, rather than
using fifty different ways to get nearly the same information to our
database.

>> Again and again processing the 15mb json file would be bothersome.
>> We definitely need some way of getting a differential json.
>> To update only the changes happened because of commit to the port file.
/.../
>> One doubt I still have is that whether  portindex2postgres.tcl again and
>> again runs the entire MacPorts port tree on the database ?
>> Or just the port that has been changed ?
>
> The PortIndex file itself is generated with the 'portindex' command,
> which only updates the data for Portfiles that have a modification time
> that is newer than the existing PortIndex file.
>
> The portindex2postgres.sql script merely converts the PortIndex file
> from the custom format based on Tcl lists to SQL. The output will always
> contain SQL statements with the full data for every port.

I would in fact really really like to see a functionality like
    portindex --mode=diff --format=json
that would create a table like

{
    new = [...],
    removed = [...],
    updated = [...]
}

which would only provide the difference since the last run of
portindex. and then we would only push this data to the database.
Since portindex already knows which files are new, this doesn't sound
impossible to implement. This sounds a lot more efficient that
transmitting full data, but sure ... full data can be done as well. I
assume that the command which updates the ports would need to mark all
ports as "potentially deleted" before reading and storing the data.

Of course, Vishnu, you should not wait for this functionality to be
implemented, I'm just saying that it would be super nice if it was
(figuring out how to implement it yourself would get you too
side-tracked, so it's probably not a good idea).

Mojca


More information about the macports-dev mailing list