MacPorts caching of distfiles

William Siegrist wsiegrist at apple.com
Mon Feb 25 09:58:47 PST 2008


On Feb 23, 2008, at 5:20 PM, Ryan Schmidt wrote:
> What about distfiles which are already missing today? How do we get  
> those into the distfiles mirror? Do we have to add them to the  
> repository, so that they can be fetched from somewhere during the  
> post-commit, and then remove them from the repository later? That's  
> wasteful of repository space. I guess committers can put it on their  
> own webspace temporarily, put that URL in the port's master_sites,  
> commit it so the post-commit fetches it, then remove it from the  
> master_sites and commit again. But that's messy. There should be a  
> way to get a distfile directly onto the mirror, for those cases  
> where it's supposed to act as master, not mirror.

The initial load can be handled manually by me. No need for committing  
or using other web hosting.  If there turns out to be frequent need  
for manually adding files, we can always come up with a simple upload  
form.

>
> What about the distfiles currently in the repository? Is there a  
> migration strategy for removing them? Or do we not care about the  
> disk space occupied by those distfiles in the repository? I guess  
> since the disk space won't be reclaimed unless we do a dump and  
> filter and load of the repository, and since that is a big pain to  
> do involving possibly lengthy downtime, we probably won't care  
> enough about the disk space.
>

I dont think the disk space is worth the trouble. The MP repo is only  
a small percentage of our total disk usage.


>
> What about distfiles that are stealth-upgraded? For example, I  
> updated the ImageMagick port to 6.3.8-9 on 2008-02-18 and a day  
> later a modified version of the 6.3.8-9 distfile appeared on the  
> download site. A user reported the checksum error to me and I found  
> that a few lines of the sourcecode had been changed in the new  
> distfile, so I updated the port revision and the checksums and  
> committed it and closed the ticket. If there had been a MacPorts  
> distfile mirror first in line providing the original distfile to the  
> user, this situation would never have been discovered, and MacPorts  
> users would never get the modified distfile, which seems like a bad  
> thing. The author of the software obviously updated the distfile for  
> a reason and wants users to have that new version.
>

I'd hope the author would bump the version in most cases, but maybe  
livecheck or distcheck can be extended to do checksums of the actual  
distfiles as a way to test for changes?


> The post-commit hook would have to do not only the fetch but also  
> the checksum phase. If the checksums don't match, then clean --all  
> (i.e. remove the (possibly old) distfile) and fetch and checksum  
> again. If it now checksums properly, great: the distfile was old and  
> has now been updated. If it still doesn't match then the author's  
> checksums are wrong and and we run clean --all again (to remove the  
> bad distfile from the mirror) and send an automated email to the  
> maintainer or committer or something. This takes care of the issue  
> of the old outdated distfile remaining on the mirror after the port  
> maintainer finds out about the stealth-upgrade and updates the  
> portfile. It does not however solve the problem of how the  
> maintainer would discover the stealth-upgrade in the first place.  
> And it negates one of the benefits of the mirror listed earlier:  
> that older distfiles should remain available for users who haven't  
> updated their ports tree or who deliberately are trying out an older  
> version.
>
> This latter problem even more greatly affects ports whose distfile  
> names do not contain the version number. By my rough grep estimate,  
> we have over 125 ports in this situation. Port authors will discover  
> a new version is available via the livecheck mechanism or via email  
> notification from the project's announce list, one would hope, but  
> once the update is committed, the old distfile won't be in the  
> mirror anymore, if it has the same name as the new file.
>
> I believe I saw that the FreeBSD mirrors put distfiles into a  
> directory whose name is the md5 checksum of that file. If we managed  
> to do that somehow that might solve the problems.
>

This goes back to Jordan's suggestion of keeping the namespace flat.  
Adding the indirection of MD5 for the minority of  ports that need it  
might be overkill? Maybe keep the current files around in a flatter  
space and bury the historical files down in MD5 or Portfile-versioned  
directories? This also seems complicated, but maybe its a reasonable  
compromise?


>
> The proposed solution does not cache / mirror distfiles which are  
> added as a result of selecting a variant or platform. Consider the  
> +doc variant of many ports which causes additional documentation  
> files to be downloaded, but there are other use cases as well; just  
> grep for "distfiles-append" in the portfiles and you'll get an idea.  
> There are ports that need to download different bootstrapping code  
> based on platform, ports that download extra code only needed for  
> the extra functionality enabled in a variant, etc.
>
> The fetch phase honors variants too, so we could get the list of  
> variants with "port variants" and run the fetch phase once for each  
> variant (in addition to a run without any variants). e.g. for smlnj  
> we would end up running:
>
> port fetch smlnj
> port fetch smlnj +universal
> port fetch smlnj +powerpc
> port fetch smlnj +i386
>
> In this port, all but +universal would end up fetching extra files.
>
> We would need to anticipate that selecting some variants will cause  
> an error message and a nonzero return code, since it is common  
> practice to display an error message and exit with a nonzero return  
> code in the pre-fetch phase if we want to prevent the port from  
> installing. For example, py-psyco does this if not running on an  
> Intel Mac. We would want to ignore these errors in the post-commit  
> hook.
>
> There's an additional problem of ports that error out based on  
> platform and don't do so in a platform selector (so there's no  
> variant we could select to overcome it). For example, the wine port  
> exits in pre-fetch if you're not running on an Intel Mac, since wine  
> needs an Intel processor. You may tell me this is fine because the  
> Mac OS Forge server runs on Intel, but then you have the same  
> problem with the oracle-instantclient port, which exits if you're  
> not running on PowerPC, since the oracle instantclient currently  
> needs a PowerPC.
>
>



This sort of thing is why I wanted a special phase. There seems to be  
a lot of baggage in using fetch (platforms, variants, moving the files  
into a different namespace).  I thought it would be easier to just add  
a phase that could parse a Portfile and figure out all possible  
distfiles and their checksums. If the multiple "port fetch" route is  
in fact easier, than I'll just need someone to work with to make sure  
the server catches all of these cases. (And yes, they are Intel, so  
something needs to be figured out for the PPC-only ports).



-Bill




----
William Siegrist
Software Support Engineer
Mac OS Forge
http://macosforge.org/
wsiegrist at apple.com
408 862 7337





-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2421 bytes
Desc: not available
Url : http://lists.macosforge.org/pipermail/macports-dev/attachments/20080225/8eabf539/attachment-0001.bin 


More information about the macports-dev mailing list