[GSoC][Binaries support] Architecture

Sat Mar 26 12:05:45 PDT 2011

On Mar 25, 2011, at 6:28 PM, Felipe Tanus wrote:

> I started to think about an architecture for the binaries support
> project, and I would like to know your opinion at some points. I think
> it's easy to notice that the software has two distinct parts, the
> server and the client. The server is the build and distribution
> solution, and the client is the port modification and maybe the
> binary-only version. I'm splitting my ideas in these two areas.

I think that's a reasonable approach, though I think the "server piece" is going to end up being, by far, the smaller of the two challenges (though not insignificant, don't get me wrong).   I'll elaborate below.

> * Failures report: It might be obvious, but since it's not on the
> description in idea's list, I think the best way of notification is by
> e-mail. A public mail list can be created to spread this info, maybe
> at macosforge.

If I were doing this, I think I'd focus first on getting the reports into some sort of database, even if it's a very primitive database consisting of a directory for each batch build and a build log for each port, since you can then "harvest" the output of this database in various ways as a second step.  You can create a web status page that port maintainers can visit daily to see how things are doing, or you can send out emails for each failure log for which there is also a maintainer, or you can do both, but that's all "step 2".

> * Compressing packages: I think it's a good idea use liblzma, like
> slackware and arch linux are doing. It has a great compression and low
> decompression time, and cost only a little more for compressing than
> usual.

I don't think it's actually that interesting to focus on the compression format in the first round - that strikes me more as premature optimization since we already have a number of serviceable archive formats right out of the box: tgz, tbz and xar files.  They all compress reasonably well and, frankly, disk space is not the first concern of the project since it has plenty (and if it needs more, no problem).  More importantly, you have much bigger problems than compression formats to worry about. :-)

> * Signed packages: I thought of it as signing at build time, with an
> GPG key. Not sure if it's the best.

Same as above.  Signed packages would be great (though I don't know about GPG - I think it would be better to use the more standard X.509 architecture since it's what Mac OS X uses to validate pretty much everything.  The project would need a signing key and some means of protecting the private key, but that's not a particularly difficult requirement.  However, again, somewhat premature until we actually have a collection of packages to sign, and I think that's still a ways off for reasons that I'm almost ready to get to. ;-)

> *Others:
>     *What universal binaries and multiple variants on ideas list
> mean? I don't get the idea.

I think Anders already elaborated on this, but basically it's the question of whether or not to make packaged versions of ports with different variants set.  E.g. if a given port (let's call it sundae) has 3 variants:  +nuts, +fudge and +whipped-cream, do you build:

sundae (port with no variants and/or default variants)

Or do you build:

sundae+nuts
sundae+fudge
sundae+whipped-cream
sundae+nuts+fudge
sundae+nuts+whipped-cream
sundae+fudge+whipped-cream
sundae+nuts+fudge+whipped-cream

Such that the user can install any possible variation of the package?   I would argue, again, that this is a premature challenge to be taking on because of the complexity (and added build time) and, to start with, you should just build "sundae" and leave the variants as a second (or possibly never) step.

>     *The language must be TCL? Shouldn't we think about using C
> instead in the distribution service to avoid performance issues? The
> compiling itself might be in any language, I think, but since we
> already have MPAB let's keep in TCL.

Now comes the hard part I've been hinting at from the very beginning.  How to actually unpack the archive "fully" on the target machine.  Let's take a relatively simple port like "ccal" as a working example.  I just built it from the port on my system and it generated the following archive file:

# xar -t -f /opt/local/var/macports/packages/darwin_10/x86_64/ccal/ccal-0.6_0.x86_64.xar 
+COMMENT
+CONTENTS
+DESC
+PORTFILE
+STATE
opt
opt/local
opt/local/bin
opt/local/bin/ccal

Looks pretty straight-forward, eh?  Just unpack that xar file into /, remove those pesky little +* files afterwards, and now the user has ccal on their system - same as if they had said "port install ccal" right?  Wrong!  Trick question! ;-)   Let's look at the Portfile for ccal:

PortSystem          1.0
name                ccal
version             0.6
categories          sysutils
distname            ${name}06
maintainers         nomaintainer
extract.suffix      .py
description         ccal
long_description    Ccal, a curses-based calendar/journal/diary & \
                        todo list program.  

platforms           darwin
homepage            http://www.jamiehillman.co.uk/ccal/
master_sites        http://web.archive.org/web/20050926034036/www.jamiehillman.co.uk/ccal/
checksums           md5 d7318e1383ac4856f1294e6de0954e3f
use_configure       no

extract     {}
build       {}
destroot    {
        xinstall -m 755 -c ${distpath}/${distname}${extract.suffix} \
            ${destroot}${prefix}/bin/${name}
}
post-install {
    # Tell the user how to invoke and get help.
    ui_msg "type ``ccal'' and press '?' for help."
}

Now we know that Portfiles are just Tcl scripts which use commands declared by the "Port system" (version 1.0 in this case), and in fact almost nothing of what you see above is in the standard Tcl command set but rather a part of the port system runtime, but the most relevant command here is at the very end:  post-install

That deceptively simple little command is the reason we don't have binary packages *already*.  Seriously.  If we hadn't hit this issue and gotten into a big argument about how best to solve it, there would have been binary package support in the very first version of MacPorts that Apple released!   The problem isn't the specific post-install rule that ccal is using, either - it's very simple, yes?  A simple ui_msg to tell the user how to use ccal.  The problem is that you can do *anything* from this rule and over 115 ports in the current collection do.  They use it for creating custom users in the accounts database (if you install postgres, for example, it needs a postgres user account to run as), they use it for emitting helpful "post installation comments" as you see above, they can create special symbolic links or create custom configuration files, etc.  Go grep for post-install in all of the ports today and you will quickly come to realize the scope of the problem, and that's just today with a comparatively small 7870 ports in the collection (FreeBSD has over 22000).

This isn't the only bit of "intelligent runtime" behavior, of course.  The package install command (lets call it ``pkg'') also needs to follow @pkgdep lines in that +CONTENTS file in order to install dependent packages (ccal doesn't have any, but most ports do) and possibly other bits of post-install behavior that folks have come up with (I think there's even a fancy way of adding user accounts now, but I could be wrong).  All of those commands need to be run at the very end of a "pkg install ccal" command, and that means one of two reasonable choices:

1. pkg needs to be able to instantiate a tcl runtime environment that can pull in all of the port 1.0 Tcl files such that the post-install (and any other special post installation procedures) can run with the full environment, e.g. commands like "ui_msg" will be present and variables like ${name} will expand properly.

2. We need to create a different set of "installation commands or scripts" that are associated with a port and specifically designed to run at post-installation time, e.g. instead of a post-install command in Tcl, we would have some sort of "install script" that is bundled with the port and merely relies on a few key environment variables to be set in order to run.  Then all 115 ports which currently declare post-install need to be modified such that they use the external script instead, and the port(1) system needs to stop running (or looking for) the post-install command and instead chain to the script so that "port install blah" still does the same thing as "pkg install blah".

Option #1 is the easiest from a perspective of not having to really change any existing ports, but then you need to solve the problem of how to install the ports infrastructure on a machine that doesn't already have it.  The actual ports runtime is less than a megabyte or so, and it would be easy enough to create a package out of it and then @pkgdep it from every port built by MacPorts (this also assumes that the bit of infrastructure you write to follow @pkgdep directives will not depend on the port system itself but be part of the "bootstrapper").

Option #2 is the simplest in terms of not having to instantiate a full Tcl environment instead of pkg(1), but you'd then have to change the way all ports/pkgs are installed.

I honestly have no preference at this point given that Landon and I managed to argue ourselves into a stalemate over the question of port runtime versioning, how to deal with different architectures (there's not just a Tcl piece, but also Pextlib) and blah blah blah, and in hindsight I think we vastly overcomplicated the problem in our minds.  The actual implementation, particularly now that we've learned that the port system never versions (we're still at 1.0 how many years later? :)) and there's really only one architecture working caring about now (x86_64), of option #1, should you choose to go that way, should be pretty easy to get to "good enough" status.

Note that I'm also glossing over details concerning direct vs image mode and whether an activate step is needed in pkg(1), in which case everything I said above about post-install also applies to pre/post-activate commands.

I realize that this is a little long, but I figured the least I could do to atone for the lack of binary packages being in MacPorts from day 1 was to describe the problem space in as much detail as I could recall...  HTH!

- Jordan

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.macosforge.org/pipermail/macports-dev/attachments/20110326/b20f00cb/attachment-0001.html>