Persistent copy of a [git] repository

Mojca Miklavec mojca at macports.org
Fri Oct 30 03:39:29 PDT 2015


Hi,

I'm reviving an old thread (fetch.type git & GitHub submodules).

I tried to make a proof-of-concept Portfile that is able to maintain a
persistent copy of the sources from a git repository:
    https://trac.macports.org/attachment/ticket/16373/xchm.Portfile
    https://trac.macports.org/ticket/16373#comment:32

The code should go elsewhere, but I would be grateful for feedback.
(Yes, we also need a solution for cvs, svn, hg, but let's start with
"the easiest" vcs type and see where we can get with that one.)

Mojca


On Wed, Mar 4, 2015 at 11:41 PM, Mojca Miklavec wrote:
> On Wed, Mar 4, 2015 at 10:53 PM, Rainer Müller wrote:
>> On 2015-03-04 22:27, Mojca Miklavec wrote:
>>>> I agree with you, creating the distfiles from VCS would be possible.
>>>>
>>>> There could be a target to be run on 'port mirror' that downloads and
>>>> creates a tarball if a non-default fetch.type is used. That alone would
>>>> reduce multiple downloads and even makes port development faster.
>>>>
>>>> However, for end-users, there is the problem that we would need to
>>>> distribute checksums for these tarballs (or rely on signatures only?).
>>>
>>> Of course we would have to distribute the checksums in that case, like
>>> for any other port. What exactly is considered a "problem" here?
>>
>> There are two options:
>>
>> a) the maintainer generates the tarball locally, uploads it to the main
>> mirror and also adds an additional checksum to the Portfile before
>> committing it
>
> No, this is certainly not what I had in mind.
>
> (But now that you mentioned it, it reminded me that we might sometimes
> want other complex strategies to fetch files, not just from VCS. Like
> wxPython where we only extract 400 kB out of a 50 MB file and it would
> be a waste to have to store and fetch all the 50 MB.)
>
>> b) tarballs are generated automatically on the server after the Portfile
>> was committed
>
> I was thinking of that option. (Tarballs would also be automatically
> generated on the user's machine straight from VCS if the file wouldn't
> exist on the mirror.)
>
>> I would prefer b) for the simple fact that this ensures that the
>> maintainer did not modify any of the files and would be closer to our
>> existing distfiles mirroring. The infrastructure changes would be small
>> if it can be integrated into what the existing 'port mirror' does.
>> However, checksums for the generated tarball are definitely not known at
>> the time the Portfile is committed.
>
> Why not? At least for GIT I can show you a trivial way to create a
> compressed file in a repeatable way. That way anyone would get the
> same checksums and the maintainer can easily add the checksums to the
> Portfile *before* committing.
>
> For other VCS it might be just slightly more complicated (I'm not so
> familiar with them), but probably not much. One just needs to make
> sure that all the files have reproducible/stable timestamps, including
> the tar file.
>
>> One solution for this would be to add an additional file in the port
>> directory after tarball generation that holds the checksums. Or, the
>> generated tarballs are also signed by the job that generated them. With
>> the signature it is possible to verify that this is the intended file
>> without distributing any additional checksum through other channels.
>
> That sounds way too complex.
>
>> In general, note that generating a tarball might include timestamps,
>> usernames, and other metadata.
>
> I just checked. I tried to generate a .tar.xz file on three different
> machines: one Mac OS X, two linux boxes. A different username and a
> different userid/groupid on every machine. And I got exactly the same
> checksum on all the three machines.
>
>> Generating it multiple times, locally by
>> the maintainer and once again on the server, will not always give the
>> same results. Although that would be the closest to what we do for
>> distfiles at the moment, combining the checksum in Portfile from a)
>> *and* the automatic generation on the server from b) is not possible.
>
> So what exactly did I do wrong when I managed to get the same
> checksums on three machines?
>
> Here's what I did:
>     git clone <some project> && cd <project>
>     git archive <shasum> | xz > ../test.tar.xz
>     md5 ../test.tar.xz
>
> I admit that I don't have enough experience to claim that this would
> generate the same checksums in all the possible scenarios that one can
> think of and all hardware that one can imagine, but I would be
> comfortable enough to speculate that in most cases there shouldn't be
> any difference on the user's Mac and the server.
>
> (I didn't test on servers in different timezones, but I would imagine
> that there must be a cure for that if that would result in some
> discrepancies. I also didn't think of zillion of other possible
> scenarios that could potentially spoil the game. I could imagine
> potential problems with repositories with "Makefile" and "makefile" in
> the same dir, resulting in different tarballs depending on whether or
> not the operation was performed on a case sensitive partition. But we
> should ask for a "bugfix" if we come across such cases. And I know for
> a fact that CVS has some "dementia symptoms" and the strategy wouldn't
> work for CVS as it keeps loosing files and folders. But given how old
> and broken CVS is, I really wouldn't care.)
>
> Mojca


More information about the macports-dev mailing list