portindex ignores (filters out) unchanged port

Thu Dec 31 06:09:10 PST 2015

Why not read the code rather than try to deduce it from observed behavior?

On December 31, 2015 5:29:42 AM EST, "René J.V. Bertin" <rjvbertin at gmail.com> wrote:
>On Tuesday December 29 2015 17:59:07 Rainer Müller wrote:
>
>>> 1 its mtime is newer (or just different)
>>
>>Different from what? Do you propose to store the mtime of the Portfile
>
>In first instance I don't see a reason to change how this is done.
>
>>> 2 it lives in a portdir not yet known
>>
>>As I told earlier, it is not possible to look up a port by portdir
>with
>>the current PortIndex format. The only key to find a port in the
>>PortIndex is by its name.
>
>Let's be precise: you mean the portdir name. And that's the bit being
>looked up from the PortIndex files. 2) means "I encounter a directory
>containing a Portfile and that directory is not yet listed in the
>index".
>
>>Looking up a port by portsdir requires either an additional portsdir
>map
>
>How does that claim not contradict what has been explained before,
>namely that a port name is "guessed" from its portdir name? I deduce
>that the name that's actually stored is the true name of the port, from
>the Portfile, but it is still compared to directory names. I think that
>must be how a port whose name doesn't match its portdir name is
>reindexed every time. AFAICS that corresponds to 2), supposing that the
>indexer takes the name of each directory it encounters that contains a
>Portfile, and then compares it to the list of already indexed (stored)
>names.
>
>Anyway, I proposed 2) based on the understanding that the portdir name
>is currently stored. I'm not sure if it is actually required (in fact I
>think it isn't).
>
>>> 3 its hash is different from the one stored in one or both of the
>>> PortIndex files (or none is stored, but that should already be
>caught by 2)
>>
>>What would that solve?
>
>The hole in the bucket :)
>
>The idea of adding a hash didn't come out of the blue in this thread.
>
>>Hashing the Portfile just needs more time. Maybe
>
>Well, yes, but that doesn't have to be a significant overhead. I invite
>anyone who doubts that to try port:unison for a while, for instance to
>sync the main port tree with a backup copy. It'll take a certain time
>on first run (I cannot recall having synced the port tree with it), but
>once that's done, checks for what to sync are almost instantaneous.
>From what I know of its implementation, it uses a combination of mtime,
>possibly size and a hash to determine whether a file has been changed.
>Getting the exact hashing function out may not be trivial given that
>unison is written in ocaml, but it certainly is a good proof of
>concept.
>
>Writing this got me curious, esp. since I was already working from my
>slowest machine (a 3yo netbook under Linux with a ZFS filesystem,
>everything to bog it down). I made a copy of
>svn.macports.org/trunk/dports, told unison to ignore the .svn and files
>directories, and let it compare the trees. The initial run took just
>over 2 minutes; subsequents rescans are maybe not "instantaneous" but
>take no more than a few eye beats (2-3 seconds).
>
>>we could only hash if it has the same mtime, but that would address a
>>hypothetical case.
>
>Actually, no; that would catch (avoid) the large majority of hash
>calculations. And if it doesn't: calculating a hash (*not* in pure Tcl
>code) is always faster than parsing the Portfile.
>
>Tcl has builtin support for (from faster to slower) crc32 (crc::crc32),
>md4 (md4::md4) and md5 (md5::md5), all of which can operate on a file.
>
>>Your proposal would make the generation of PortIndex more robust, but
>>also requires a redesign of the PortIndex format to make it possible
>to
>>look up ports by either portsdir or by port name.
>
>Yes. That too is something that was on the table.
>
>However:
>
>>Anyway, using the name of the port for the ports dir is a reasonable
>>convention in my opinion. We should just keep it that way, regardless
>of
>>the current technical requirement.
>
>If that's the consensus or majority opinion I am not going to invest
>any time in trying to design an improved (more robust AND more
>flexible) indexing algorithm (only to find that my investment was for
>nothing).
>
>R.
>_______________________________________________
>macports-dev mailing list
>macports-dev at lists.macosforge.org
>https://lists.macosforge.org/mailman/listinfo/macports-dev