github port group

Ryan Schmidt ryandesign at macports.org
Sun Apr 22 13:51:24 PDT 2012


On Apr 22, 2012, at 10:22, Craig Treleaven wrote:
> At 3:05 AM -0500 4/22/12, Ryan Schmidt wrote:
>> 
>> On Apr 21, 2012, at 20:30, Sean Farley wrote:
>>> Which then will use the zip / tarball download by default
>> 
>> I didn't think github had automated downloads available except for tags.
>> 
>> If github has automated downloads available for any tag/branch as well, then we would need to verify that they always have the same checksums, and are not generated on the fly. I'm pretty sure that bitbucket, for example, generates them on the fly, meaning different users requesting them at different times will get different checksums, which means they're not suitable for use as master_sites in MacPorts.
> 
> 
> I'm no Git expert, but wouldn't git archive help us?  Git archive will retrieve from a remote repository and can format the result as a zip file.  From the man page:
>      <tree-ish>
>           The tree or commit to produce an archive for.

Yes, a port developer could use such a command to create a tarball, which would then by some manual process be uploaded to the distfiles.macports.org server, and then the portfile would be manually modified to reference that distfile instead of fetching from git. This is a process we have recommended before, but since it involves a lot of manual labor most people don't do it. It's not the process that's been discussed thus far in this mailing list thread.


> I don't know whether the checksums would always be the same.  As I understand Git, a commit hash uniquely identifies a particular state of the repository so if we specify a hash, we'll always get precisely the same result.  The only exception would be, I guess, if someone has hacked the repository.  I have no idea if that could be done without detection by the repository site.  I take it that is what we're trying to protect against?

No, I'm trying to protect against the gzip compression of the tar archive varying from generation to generation. gzip compression uses entropy -- random numbers. If you have two identical tar archives, and gzip compress them with the same settings, the resulting gzip files will not be byte for byte identical, and thus they'll have different checksums:

$ sha1sum *.tar
92bfe8b02b49b977a18c9f8e8d301a0ef159fe51  1.tar
92bfe8b02b49b977a18c9f8e8d301a0ef159fe51  2.tar
$ gzip 1.tar
$ gzip 2.tar
$ sha1sum *.tar.gz
39c6beda6851d98295f770a11b8ea122647ae4c8  1.tar.gz
7a95ea746e698d367ec155e4387972051e1a2e38  2.tar.gz
$ 

It seems unlikely that github has an infinite amount of disk space to forever retain any tarball of any revision of any repository that some user may only have requested one time and nobody will ever request again. So I would assume they keep this generated tarball around for a period of time, maybe 24-48 hours, and then delete it if it hasn't been requested again.

Tarballs of tags, on the other hand, I believe they do keep forever, since it's reasonable to expect tarballs of tags to be downloaded often, and it's desirable for them to have the same checksums, so they can be verified, as MacPorts does.





More information about the macports-dev mailing list