talkin' 'bout GSoC 2011

Tue Apr 5 04:18:30 PDT 2011

Jordan,

As far as the namespace and development practices go, I swear that this isn't a question of smoking the opium pipe or castles in the air because it already exists. I've worked in an environment that built its infrastructure this way, and the reference to OpenEFS I provided previously was an attempt by people I know to bring this out of proprietary enterprise technology into the FOSS world. See, for example, this 2010 presentation from the lead architect and implementer reflecting on experiences building systems with the same objective on both AFS and NFSv3:

http://workshop.openafs.org/afsbpw10/talks/wed_1_1/AFSKerberos2010.pdf

Software distribution to the network as opposed to individual systems is one of the technologies that scales an infrastructure to 100,000 servers or more without being crushing it with admin overhead. I'm a big believer in the need for migration of technology and architectural thinking between different kinds of organisations (government, universities, corporations, communities), which is something I think distinguishes Unix from the rest of the pack over the long haul. I'm just hoping to be a part of a further iteration of that, in a part of the cycle where technology moves from the enterprise out to communities. While I've not seen OS X used with this kind of environment, nothing I've seen of it indicates tells me it's fundamentally incapable. If there's a question of castles in the air, the question is whether there's real interest outside of enterprises who already been exposed to the benefits. Thus far, I'll admit that there's little demonstration of interest (as testified by the pleas for interest in OpenEFS in the presentation just cited).

Dependency management is a crucial problem (as you note, if you want to reclaim old binaries, you need to know whether they're in use), and not even binary analysis can solve this completely (because of functionality like dlopen()), meaning that a robust solution ends up needing some level of audit support from the run-time linker/loader. A good start, however, is to be able to retain data used from build time (what link definitions did I give it?).

I've had some experience with using VFS hacking via LD_PRELOADs to change the filesystem image on a per-process basis–it's useful for certain kinds of testing, but I don't think it scales well against the larger problem space. You still need a namespace for managing the physical storage, so the question is what benefit you get from logically remapping that for applications. My initial thought would be that virtualisation is most effective when the unit of consumption is less than the unit of provisioning and/or allows resource sharing between consumers, and I think the problem here is that you'd end up without any net reduction in complexity and probably an increase, as I think you've intuited already (e.g. if you use that namespace to help you find something like debug symbols, you have to map relative references to a binary to the same physical path, but how do you know when to use the virtual rather than the physical reference when attempting namespace based affordances rather than strict relationships?). I'd expect that a combination of filesystem indexing/search, evolution of binary formats, and real-time linker-loader is worth further exploration (e.g. have the filesystem keep signature data so that fingerprinting is nearly free, index binary header information so the linker can search it, maybe allow link definition by signature and unqualified object name,), but such is definitely the part where we're passing the pipe.

I don't seem to be alone is saying that source integrity is an issue that FOSS projects need to address. See, for example, slides 20 and 21 of David Wheeler's presentation to MIL-OSS (the US military project for use of open source software):

http://www.slideshare.net/onesimplehuman/oss-security-miloss2010

If I'm a little crazy in thinking this, I can at least claim some reputable company. And perhaps ill-reputable, too:, There's evidence that attackers are interested in source exploitation, both closed (Google, RSA, where the attempts look to be about reconnaissance) and open (Linux, PHP, where attempts were made at compromising source code), and the vectors being used are clearly able to evade build hardening and firewalls. If you want a case study in how you can deploy all these things and still end up pwned, have a look at:

http://arstechnica.com/tech-policy/news/2011/02/anonymous-speaks-the-inside-story-of-the-hbgary-hack.ars

Sandboxing isn't a comprehensive solution, either. It's been used extensively to deal with problems of mobile code (Java, web technologies), and there are examples in OS X of how MAC can be used to improve the security of software with well-defined and limited dependencies (bootpd, krb5kdc). It tends to require a lot of domain-specific adaptation (it's been a lot of work to implement this in browsers), and it doesn't scale well for complex distributed applications. (I don't have the references to hand, but there have been a few papers from an Australian researcher working with SELinux for a security architecture for a national system of online patient medical records, and what he found was that MAC was too complex to be used for granular controls, thus limiting its application to compartmentalisation and requiring proxies to manage data transfers between compartments.) Projects like OpenBSD have shown that they can provide very effective solutions by using cryptography and privilege separation, and my understanding is that this ends up being a good deal less resource-intensive (although I'm not claiming it's a general solution, just that it can be a shorter path to a lot of the benefits of MAC, particularly those not required outside TLAs, and easier to audit as an implementation). A big part of security is understanding what makes people feel secure while leaving them open to exploitation. Problems associated with vectors like buffer overflows have been reduced over the last decade (although certainly not eliminated), but that hasn't necessarily left us with a next reduction in attack surface, just its migration to other areas of software architecture, implementation, and deployment.

I think the most significant obstacle to Unix security is that is ongoing treatment as a matter of competitive advantage. It's fine for platforms to go their own way and innovate, but you also need to see stages following that where standards are agreed so that solutions are portable and commodified. There are certainly things to like about the iOS model, but having a strictly application-specific storage model, with limited provision to open documents in registered UTI handlers tells me that it's not really aiming to solve the problems of Unix applications but to define an application security model for a mobile platform. I mean, if the result is that there's no straightforward way to support functionality I'd consider basic like GPG/PGP signature and encryption, I don't see that iOS has a solution adequate on its own terms.

Cheers,
Bayard

On 3 Apr 2011, at 00:01, Jordan K. Hubbard wrote:

> 
> On Mar 29, 2011, at 3:51 PM, Bayard Bell wrote:
> 
>> I've been somewhat active on MacPorts previously and am interested in becoming more active via GSoC 2011 in particular.
>> [ ... ]
> 
> Wow, that was some read, but I think we're on the same wave-length with a lot of the ideas you expressed in your introduction!
> 
> For example, I've been whining for (some would say ``championing'') the notion of  supporting direct dependencies in the depot for years, hoping we might be able to someday support the notion of having multiple versions of a given port instantiated simultaneously rather than having everything just collide in a single prefix location, but it's also never been an idea without its own unfortunate pitfalls.  Without filesystem support to synthesize namespaces on a per-process-hierarchy basis during builds, for example, all of the paths need to be called out explicitly at compile/link time (-L/opt/local/var/db/.../somefoopkg -lfoo -L/opt/local/var/db/.../somebarpkg -lbar ... and so on for include paths as well), and at some point the actual maximum length of a process argument list becomes a concern.  I know, ugh.  It also makes upgrading ports somewhat more "interesting" as you have multiple moving parts in the dependency list now and you need to figure out which are "floating" (can be the latest version, possibly allowing an older version to be GC'd) and which are hard dependencies that can never move - all of that generally ends up being driven by additional metadata provided by the port maintainers after much sweat and blood figuring out which combinations work and which don't.  Or they do it once and never look at it again, leading to a lot of old library versions lying around in desperate need of GC'ing.
> 
> Of course, if we're just passing the opium pipe here, then we could also imagine a world where the OS provided dynamic filesystem namespace manipulation, allowing us to simply map the dependent packages straight into "/opt/local" (since each package's binaries would see their own view), much in the same way that dyld brings in shared libraries for a process at runtime, and every executable installed by MacPorts would see only its own dependency graph (and nothing more or less) in /opt/local at execution time.  Since the OS does not provide this and we're all out of opium anyway, perhaps the next best thing would be to make /opt/local a MacFUSE mount and start signing binaries such that the FUSE process can figure out how to display a different view based on who's asking.  We're still building castles in the air of course, but it's nice to imagine being able to at least prototype such a thing, just to see if all of the combinatorial flexibility was awesome or, in deployment, a huge pain in the debugger.
> 
> Security is another category all its own, and yes, sandboxing is going to become an ever-more important consideration as things evolve.  You've focused mostly on the build-time sandboxing issues in your message, but I don't actually think that's where the real action is going to be.  Going out of our way to secure what should increasingly be a back-room process done behind firewalls on controlled build machines (in order to create binary packages targeted towards end-users) is probably not going to generate much ROI when weighed against the challenges of sandboxing the runtime environment for the packages and/or resulting binaries of "port install" themselves.
> 
> I think iOS has the right app delivery and trust model here, as far apart as MacPorts and iOS might currently seem, if anyone in MacPorts is truly interested in thinking that far ahead on future architectural designs.   Putting the Unix (security model) genie back in the bottle might seem a daunting task, and for arbitrary command line tools it may never be resolved (though I have some ideas), but for arbitrary Cocoa apps that day may be getting closer.  Such apps may not constitute the bulk of MacPorts' collection today, but I can also imagine that changing over time.  Just food for thought!
> 
> - Jordan
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.macosforge.org/pipermail/macports-dev/attachments/20110405/f1110973/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 1515 bytes
Desc: not available
URL: <http://lists.macosforge.org/pipermail/macports-dev/attachments/20110405/f1110973/attachment-0002.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 203 bytes
Desc: This is a digitally signed message part
URL: <http://lists.macosforge.org/pipermail/macports-dev/attachments/20110405/f1110973/attachment-0003.bin>