portindex ignores (filters out) unchanged port

René J.V. Bertin rjvbertin at gmail.com
Thu Dec 31 02:29:42 PST 2015


On Tuesday December 29 2015 17:59:07 Rainer Müller wrote:

>> 1 its mtime is newer (or just different)
>
>Different from what? Do you propose to store the mtime of the Portfile

In first instance I don't see a reason to change how this is done.

>> 2 it lives in a portdir not yet known
>
>As I told earlier, it is not possible to look up a port by portdir with
>the current PortIndex format. The only key to find a port in the
>PortIndex is by its name.

Let's be precise: you mean the portdir name. And that's the bit being looked up from the PortIndex files. 2) means "I encounter a directory containing a Portfile and that directory is not yet listed in the index".

>Looking up a port by portsdir requires either an additional portsdir map

How does that claim not contradict what has been explained before, namely that a port name is "guessed" from its portdir name? I deduce that the name that's actually stored is the true name of the port, from the Portfile, but it is still compared to directory names. I think that must be how a port whose name doesn't match its portdir name is reindexed every time. AFAICS that corresponds to 2), supposing that the indexer takes the name of each directory it encounters that contains a Portfile, and then compares it to the list of already indexed (stored) names.

Anyway, I proposed 2) based on the understanding that the portdir name is currently stored. I'm not sure if it is actually required (in fact I think it isn't).

>> 3 its hash is different from the one stored in one or both of the
>> PortIndex files (or none is stored, but that should already be caught by 2)
>
>What would that solve?

The hole in the bucket :)

The idea of adding a hash didn't come out of the blue in this thread.

>Hashing the Portfile just needs more time. Maybe

Well, yes, but that doesn't have to be a significant overhead. I invite anyone who doubts that to try port:unison for a while, for instance to sync the main port tree with a backup copy. It'll take a certain time on first run (I cannot recall having synced the port tree with it), but once that's done, checks for what to sync are almost instantaneous. From what I know of its implementation, it uses a combination of mtime, possibly size and a hash to determine whether a file has been changed. Getting the exact hashing function out may not be trivial given that unison is written in ocaml, but it certainly is a good proof of concept.

Writing this got me curious, esp. since I was already working from my slowest machine (a 3yo netbook under Linux with a ZFS filesystem, everything to bog it down). I made a copy of svn.macports.org/trunk/dports, told unison to ignore the .svn and files directories, and let it compare the trees. The initial run took just over 2 minutes; subsequents rescans are maybe not "instantaneous" but take no more than a few eye beats (2-3 seconds).

>we could only hash if it has the same mtime, but that would address a
>hypothetical case.

Actually, no; that would catch (avoid) the large majority of hash calculations. And if it doesn't: calculating a hash (*not* in pure Tcl code) is always faster than parsing the Portfile.

Tcl has builtin support for (from faster to slower) crc32 (crc::crc32), md4 (md4::md4) and md5 (md5::md5), all of which can operate on a file.

>Your proposal would make the generation of PortIndex more robust, but
>also requires a redesign of the PortIndex format to make it possible to
>look up ports by either portsdir or by port name.

Yes. That too is something that was on the table.

However:

>Anyway, using the name of the port for the ports dir is a reasonable
>convention in my opinion. We should just keep it that way, regardless of
>the current technical requirement.

If that's the consensus or majority opinion I am not going to invest any time in trying to design an improved (more robust AND more flexible) indexing algorithm (only to find that my investment was for nothing).

R.


More information about the macports-dev mailing list