Persistent copy of a [git] repository

Mojca Miklavec mojca at macports.org
Tue Nov 3 08:12:43 PST 2015


Reposting the question ... I would be grateful for some feedback (and
someone who could help me move the code to the core or some PortGroup
in case that the approach sounds reasonable enough).

Mojca

On Fri, Oct 30, 2015 at 11:39 AM, Mojca Miklavec wrote:
> Hi,
>
> I'm reviving an old thread (fetch.type git & GitHub submodules).
>
> I tried to make a proof-of-concept Portfile that is able to maintain a
> persistent copy of the sources from a git repository:
>     https://trac.macports.org/attachment/ticket/16373/xchm.Portfile
>     https://trac.macports.org/ticket/16373#comment:32
>
> The code should go elsewhere, but I would be grateful for feedback.
> (Yes, we also need a solution for cvs, svn, hg, but let's start with
> "the easiest" vcs type and see where we can get with that one.)
>
> Mojca
>
>
> On Wed, Mar 4, 2015 at 11:41 PM, Mojca Miklavec wrote:
>> On Wed, Mar 4, 2015 at 10:53 PM, Rainer Müller wrote:
>>> On 2015-03-04 22:27, Mojca Miklavec wrote:
>>>>> I agree with you, creating the distfiles from VCS would be possible.
>>>>>
>>>>> There could be a target to be run on 'port mirror' that downloads and
>>>>> creates a tarball if a non-default fetch.type is used. That alone would
>>>>> reduce multiple downloads and even makes port development faster.
>>>>>
>>>>> However, for end-users, there is the problem that we would need to
>>>>> distribute checksums for these tarballs (or rely on signatures only?).
>>>>
>>>> Of course we would have to distribute the checksums in that case, like
>>>> for any other port. What exactly is considered a "problem" here?
>>>
>>> There are two options:
>>>
>>> a) the maintainer generates the tarball locally, uploads it to the main
>>> mirror and also adds an additional checksum to the Portfile before
>>> committing it
>>
>> No, this is certainly not what I had in mind.
>>
>> (But now that you mentioned it, it reminded me that we might sometimes
>> want other complex strategies to fetch files, not just from VCS. Like
>> wxPython where we only extract 400 kB out of a 50 MB file and it would
>> be a waste to have to store and fetch all the 50 MB.)
>>
>>> b) tarballs are generated automatically on the server after the Portfile
>>> was committed
>>
>> I was thinking of that option. (Tarballs would also be automatically
>> generated on the user's machine straight from VCS if the file wouldn't
>> exist on the mirror.)
>>
>>> I would prefer b) for the simple fact that this ensures that the
>>> maintainer did not modify any of the files and would be closer to our
>>> existing distfiles mirroring. The infrastructure changes would be small
>>> if it can be integrated into what the existing 'port mirror' does.
>>> However, checksums for the generated tarball are definitely not known at
>>> the time the Portfile is committed.
>>
>> Why not? At least for GIT I can show you a trivial way to create a
>> compressed file in a repeatable way. That way anyone would get the
>> same checksums and the maintainer can easily add the checksums to the
>> Portfile *before* committing.
>>
>> For other VCS it might be just slightly more complicated (I'm not so
>> familiar with them), but probably not much. One just needs to make
>> sure that all the files have reproducible/stable timestamps, including
>> the tar file.
>>
>>> One solution for this would be to add an additional file in the port
>>> directory after tarball generation that holds the checksums. Or, the
>>> generated tarballs are also signed by the job that generated them. With
>>> the signature it is possible to verify that this is the intended file
>>> without distributing any additional checksum through other channels.
>>
>> That sounds way too complex.
>>
>>> In general, note that generating a tarball might include timestamps,
>>> usernames, and other metadata.
>>
>> I just checked. I tried to generate a .tar.xz file on three different
>> machines: one Mac OS X, two linux boxes. A different username and a
>> different userid/groupid on every machine. And I got exactly the same
>> checksum on all the three machines.
>>
>>> Generating it multiple times, locally by
>>> the maintainer and once again on the server, will not always give the
>>> same results. Although that would be the closest to what we do for
>>> distfiles at the moment, combining the checksum in Portfile from a)
>>> *and* the automatic generation on the server from b) is not possible.
>>
>> So what exactly did I do wrong when I managed to get the same
>> checksums on three machines?
>>
>> Here's what I did:
>>     git clone <some project> && cd <project>
>>     git archive <shasum> | xz > ../test.tar.xz
>>     md5 ../test.tar.xz
>>
>> I admit that I don't have enough experience to claim that this would
>> generate the same checksums in all the possible scenarios that one can
>> think of and all hardware that one can imagine, but I would be
>> comfortable enough to speculate that in most cases there shouldn't be
>> any difference on the user's Mac and the server.
>>
>> (I didn't test on servers in different timezones, but I would imagine
>> that there must be a cure for that if that would result in some
>> discrepancies. I also didn't think of zillion of other possible
>> scenarios that could potentially spoil the game. I could imagine
>> potential problems with repositories with "Makefile" and "makefile" in
>> the same dir, resulting in different tarballs depending on whether or
>> not the operation was performed on a case sensitive partition. But we
>> should ask for a "bugfix" if we come across such cases. And I know for
>> a fact that CVS has some "dementia symptoms" and the strategy wouldn't
>> work for CVS as it keeps loosing files and folders. But given how old
>> and broken CVS is, I really wouldn't care.)
>>
>> Mojca


More information about the macports-dev mailing list