Packages Not [was Re: ambivalence about fortran (was Re: numpy & non-Apple gcc?)]

Mon Sep 20 16:11:06 PDT 2010

On Sep 20, 2010, at 2:20 PM, Ryan Schmidt wrote:

> I have been interested in binary packages but have not really understood how people wanted to do it, so this thread is instructive to me.
> 
> I had assumed that this "tool" you talk about being run would be MacPorts: the port command: "sudo port install foo", that it would just be enhanced to be able to download pre-compiled binaries that matched the OS, architecture and variants the user requested, if such existed on our hypothetical binaries server, and if not, it would build from source as it always has so far.

So, here's the 10,000 foot view of packages in the hope that it may be somewhat instructive (sorry, Jeremy, I know you're all about the archives but archives just don't interest me since they cannot be used stand-alone and separate from MacPorts ;-).

First, and to answer your specific question, if you have to fall back to building from source then you're really talking about a hybrid system and I think hybrids basically suck because they fail to keep church and state separate.  I will explain the metaphor:  Let's say I have some set of Mac customers out there who truly represent "end users" and have minimalist end-user configurations, no dev tools, maybe even no X11, nothing but a stock machine as configured at the factory and purchased at the Apple store.   I can make some general statements about those users ("46% run Leopard, 50% run SnowLeopard, 4% "other") and figure out just what sorts of software I can provide, linked against some specific set of OS components, and still have a good end-user experience with that software (e.g. it runs and doesn't crash or immediately fall over with some missing library).

Great.  Now I know that I need to set up my BUILD environment such that it is a good match for the majority of those customers, given their minimalist configs, and that I can also build software on that environment all day long *without polluting it*.  That is key.  If my builder self-corrupts over time through stuff escaping my build jails then the system is basically useless because I'm going to start inadvertently creating software that my users cannot use.  I want my build machine to be sacrosanct and utterly untouched by those evil "package" things since it's a builder, not a target.

For my customer machines, conversely, I want to be able to see them all purely as targets and not have any artifact of the build process leak through to them.  They can't even build software anyway since, as I noted, they are minimalist configs.  Ideally, after I am done setting them up, their software config should represent nothing more than the factory config + my packages, and if my package management system is at all reasonable, it will also allow me to uninstall one or all of those packages in order to bring their system back to factory configuration (let's say something goes wrong - the customer is obviously going to blame me and my packages first, and I may need to get rid of them all just to exonerate myself).

So, to finish the "mission statement" side of this, builders should build and installers should install.  That is the separation of church and state which allows people like me to sleep at night, knowing that the entire "assembly line" has been put together as robustly, and with as many clean points of separation, as possible and I'm not going to have anything actually permute a system unless I explicitly ask for that to happen.

So, as complicated as everyone seems to like to make this every time the subject comes up, the actual task to be done is fairly simple.   MacPorts needs to work essentially like this:

port install foo
entails:
	fetch foo
	extract foo
	configure / build foo
	"install" foo to destroot
	make package from destroot (say, just for discussion's sake, that package goes into /tmp in this scenario)
	cleanup destroot and basically "finish" the macports phase

	package-install /tmp/foo.xpkg

From an end-user perspective, as Jeremy has already described, "port install foo" is still all that anyone using MacPorts  needs to know and it's still just at arms-length as it always was, it just makes MacPorts simpler to only have to generate packages as its end-goal.  It does not have to worry about installation, rollback, upgrades or any of that stuff since that is the job of package-install / package-delete / package-upgrade, and the creators of those tools can take on all of the security / auditing requirements that any software which actively changes your system ought to at least pay lip service to - macports does not have to worry about it because it is not, in a large sense, macports' problem anymore.

Now, if that all seems maybe a bit *too* simplistic, it's because it is.  There is some stuff that needs to be portaged across from the Portfile (or some other source of metadata which lives in MacPorts) in the form of package metadata because, as I've already pointed out to Jeremy et al, there are things which don't simply fit into the archive.  You probably want a list of checksums to compare the target files against, just to make sure the package file was not corrupted/tampered with in transit, and then there are files which don't actually get installed with the package but need to be preserved across upgrades (configuration files that the end-user creates, mostly).

There are also the usual post-install actions ("install user postgres for this postgres database package", etc) which need to be expressed in some form - preferably not just by throwing a shell script at the problem, either, since a shell script is hard to introspect and audit.  A list of "requirements" as an XML file could probably do the job just as well, assuming you designed the requirements with some reasonable degree of care and had the requisite code in package-install to parse and act on that data as a final package install step.   We (Landon and I) got ourselves wrapped around the axle in the past trying to figure out a scheme where the existing Tcl procs (post-install, et al) could be run by the package runtime, making the Portfile the single authoritative source for all of this information, but maybe that was setting the bar too high and we should have aimed at something far more simplistic.  It's never too late to do that.

Daniel also chastises me somewhat for "having a lot to say" every time this topic comes up, and I apologize if I've seemed overly verbose over the years, but I hope folks will cut me at least a few inches of slack given that I have actually written all of this crap from scratch before and know how it actually works in practice, at least. :-)

> I had no idea there was a goal for users not to have MacPorts installed.
> 
> We just recently added the hard requirement in the MacPorts installer that Xcode be installed, since not having Xcode installed was causing so many various bug reports. But if MacPorts gets a binaries server, and the ability to download and install from it, that requirement could be relaxed and the check moved into the port command in such the port command only errors out and notifies the user Xcode is required if a binary package matching the user's request was not found.

You could do that, or you could just punt and say "file a request asking for a binary package for foo" and not even try to make MacPorts do any complicated fall-back behavior (since, again, this behavior would actually have to be in package-install, and the church/state line starts getting blurry again).  That said, I see no reason why package-install couldn't ask a server "in the cloud" somewhere to actually build a package to spec, and get this package once done.  The very act of making the request could cause the server to create the package on demand, in which case all of your abstraction boundaries remain nicely preserved.

The reason not to have MacPorts installed is largely one of purity.  If you're an end-user who does not care, then you use the packages collection through some Cocoa / Web front-end and never have to even know MacPorts exists.  If you're a developer / propeller-head type and actually do want to build stuff yourself, then you can download MacPorts and use it the same as aways, the actual package step being embedded into/hidden by "port install" as described above.  I just see no reason why a user who cares only about binary packages should have to download MacPorts since the two are logically distinct.

You also ask why they have to be separate, and I guess if I had to give a one-word answer other than "purity", it would be security.  The smaller your attack surface is, and getting bits onto a user's system in an untrusted/unauditable/one-way fashion is definitely an attack surface, the better off you are.  It doesn't even have to be a malicious type of attack, but simply one caused by inattention to some detail.   If the set of operations can be constrained to not calling out to a 3rd party build system, that's a lot less attack surface to worry about and maintain control of.

- Jordan