Package managers and package versioning
ryandesign at macports.org
Sun Oct 4 21:34:00 UTC 2020
On Oct 4, 2020, at 14:09, Jason Liu wrote:
>> Where by "this" you again mean the ability to specify a previously-available version, not an arbitrary version that was never in a portfile.
> Yes, of course. If you look at the folder for the Blender package on the Debian repo
> from my previous example, there are many versions of Blender that were skipped. For instance, they skipped version 2.82 and went from 2.79b right to 2.82a. And they skipped versions 2.83.0-2.83.4, and went from 2.82a right to 2.83.5. Obviously, there would be no way for a user to try to install Blender version 2.83.2 using apt-get, because that version was never packaged (i.e. spec files don't exist) for that particular version of Blender.
Let's try not to say "of course". Nothing is obvious in this discussion. Let us be very clear at every step what we are proposing so that we can have a meaningful discussion.
It's my understanding, for example, that Homebrew allows the user to request to install the git master of a particular Homebrew formula, in effect allowing the user to (try to) install a version of the software that was never previously attempted by the formula's creator. This is what I was afraid was part of what was being proposed, and something I would not support adding to MacPorts.
> The examples were meant to illustrate the fact that it's possible to restrict the dependency versions when installing an old package. By specifying maximum versions for all of a package's dependencies (by using something like <=2.4.0 or <2.4.0), it would guarantee that the old package would never have to be built against new dependencies, because if a user were to install the old package, it would also install older versions of the dependencies.
I understand that it would be possible to define a syntax for specifying dependency versions and that MacPorts base could be enhanced to parse it and even to find and download the right version of the portfile and to install different versions of ports to different locations.
The problem is that if you are currently compatible with the latest version of a dependency, you do not know in advance whether you will be compatible with the next version of that dependency.
So you can either be conservative or liberal. MacPorts has tended to be liberal in these matters, assuming that a port can build on any architecture, any OS version, with any compiler. Once it is discovered that it can't build on some subset of those, restrictions can be added. supported_archs can be used to rule out incompatible architectures. compiler.blacklist can be used to rule out incompatible compilers. Ports can indicate which OS versions they're compatible with, though not as elegantly as I would like.
If we continue the liberal approach with dependencies, then we begin from the assumption that a port can use any version of a dependency. Let's say libpng declares it can use any version of zlib. Then years later zlib 2 is released and it is incompatible, breaking all versions of libpng we've published. There's now a lot of work to go edit each previous libpng to indicate that it requires zlib < 2.
Alternatively, suppose we use the conservative approach and declare that libpng requires zlib < 2. Or zlib < 1.3. Or zlib < 1.2.12. We have no way to know in what future version of zlib, if any, incompatibility might arise. Suppose we declare compatibility with zlib < 2, and years later zlib 2 comes along and maintains perfect backward compatibility with zlib 1. Then our libpng would continue to use the old zlib 1, even though it would work with zlib 2, thus making users use outdated dependencies unnecessarily and possibly exposing them to bugs that have already been fixed in newer versions. And the solution then is to do a lot of work to go edit each previous libpng to update its zlib compatibility list.
Either way, it's signing all port maintainers up for a lot of work. (Many port maintainers already don't do the currently expected amount of work to maintain their ports.) And all for the benefit of other ports being able to declare that they require an old version of libpng. We did have this problem once before, when some old ports were not compatible with libpng 1.4 when it came out. Rather than introducing a libpng12 compatibility port, we did the right thing and fixed other software to be compatible with libpng 1.4. If we had used a libpng12 port instead, we likely would never have noticed if and when those ports gained libpng 1.4 compatibility and would be forcing libpng12 on users unnecessarily for years to come.
> This is one place where MacPort's activate/deactivate would really shine: If someone wanted to install an old package, MacPorts would first deactivate all of the dependencies that were too new, and then install the old versions of the dependencies that are compatible with the old package.
If all versions of a port install to the same location on disk as they do currently and only one version of a port can be active at a time, then your proposal here -- "deactivate all of the dependencies that were too new" -- will break all the ports that depend on the newer dependencies.
I thought it was being proposed that all versions of a port would install to different locations on disk so that many could be active simultaneously, which would avoid the one problem while causing many new ones, most of which we haven't begun to discuss yet.
>> Despite the bloat it might work for that use case, but some npm module developers unnecessarily restrict the versions of dependencies they're compatible with, meaning that you get an unnecessarily outdated version of that dependency, even though a newer version of the dependency would work and might fix some bug you're experiencing. And this ballooning of dependencies can use a tremendously larger amount of disk space than if just a single compatible copy of each dep had been installed. We already have users complaining about unnecessary dependencies and unnecessary disk usage; this would just make it worse.
>> And do you really want to install a separate copy of gettext for each dependency that uses it?
> Well, let's be frank here. This is the "Apple way". Other than the system frameworks, Apple has always strongly encouraged that software keep all of its bits and pieces inside of its application bundle in the /Applications folder. This has traditionally meant that multiple apps would have their own copy of the same gettext library in each of their application bundles; because Apple has never said "hey developers, put all of your shared libraries in this /usr/local or /opt/local folder". Instead, they've always gone the route of "each and every app only gets to play in its own sandbox; you should bring all of your own toys (i.e. libraries) to your own sandbox... no sharing of toys! (other than the ones Apple provides and controls, i.e. system frameworks)"
> I'm not saying that I agree with Apple's viewpoint, but I do find it a bit odd that users would use a *nix-like argument of "share as much as possible to reduce bloat" to complain about something (MacPorts) that lives on an Apple Corporation-made system, where "share as little as possible" is the norm. MacPorts is already a shining example of sharing and reusing the same libraries on a system that highly discourages such behavior.
What you say about the "Apple way" of doing application bundles is true, and it in no way follows that we should do the same in MacPorts.
Consider an application like Firefox, developed by an organization, using various open source libraries, which are all included in the application bundle. But that's it. There's only one copy of each dependency because the developers know that they are creating a single product and that all the dependencies need to work together. They can hold back updating certain dependencies if they know it will cause problems for other libraries they use. They can exclude manpages, programs or other files they don't need and distribute only what's needed for their specific use case within Firefox. They have a narrow focus.
For us, the single product we are providing is MacPorts, which has a very broad focus: ten thousand software packages intended to work together. That goal isn't always met: sometimes updating one port breaks another, until we fix it. But treating each of those ten thousand ports as a separate top-level product that needs its own world of dependencies would be an unfathomable disaster.
Installing all ports to a different top-level prefix for that port name and version, which is what I thought we were talking about initially, is slightly less disastrous. For example, if we had gone the other way with the libpng example above, we could have prefixes /opt/local/port/libpng/1.2.57 and /opt/local/port/libpng/1.6.37, each of which would contain the bin, include, lib, etc. directories for that version of libpng. Then the majority of ports that are fine with 1.6 could use that and those that need 1.2 could use that, and then that would only be two copies of libpng installed instead of a thousand.
Don't forget also that separate libraries would be loaded into memory separately. If you have a zillion copies of gettext and libpng installed, one for each port that uses them, and then you want to use something huge like Qt or WebkKt, then you're loading hundreds of copies of gettext or libpng into memory.
>> But you haven't addressed how you could build historical "Portfiles" on a newer OS or compiler version when they would not have been written to account for the new restrictions imposed by that newer OS or compiler version, and we go back to my supposition that a portfile maintainer would be asked to keep updating and patching ever old versions of their portfile.
>> How would you address the "implicit declaration of function" problem I mentioned? A new version of Xcode, 12, has come along and shown us that a zillion of our ports, such as php, do not include the headers that declare the functions they are using. We must fix this for compatibility with future Macs.
>> This is just an example of the type of problem that comes up all the time. Yosemite required tons of patches because many build systems misinterpreted 10.10 as 10.1. macOS 11 is requiring tons of patches because many build systems expected the macOS major version to remain at 10 forever. Many build systems need patches now for ARM support. etc. etc. etc.
> In this particular case, I believe that the Xcode version, and even the macOS version, should be counted as just another type of dependency... one that happens to be required in every single portfile. If these (required) variables also had the ability to specify version ranges, we could effectively cap each version of a portfile with a maximum compatible version of macOS and Xcode, which could be raised as compatibility patches are added for newer macOSes and Xcodes.
> macos_version_compatibility <=10.15.7
> (I prefer to use Darwin version)
> darwin_version_compatibility <=19
> (this portfile would be compatible up to Catalina, but not yet compatible with macOS 11)
> xcode_version_compatibility >=8.2.1 && <=12beta2
> (or whatever)
Agreed we should have a way to specify OS version compatibility with a variable. It is a long-standing wish. There is a ticket open about maybe using the currently-unused platforms variable for this purpose. Many ports do indicate OS compatibility but with manually-written code blocks.
We already have the supported_archs variable for indicating architecture support.
Xcode version compatibility has historically not been so interesting; it's usually the compiler and SDK version that's relevant. We already support restricting which compilers can build a port, using compiler.blacklist. I'm working on something similar for SDK versions. We do already have the xcodeversion portgroup for restricting Xcode versions, but it is not usually necessary and it is old and does not account for the fact that the command line tools are now separate from Xcode and that one might want a way to specify "Xcode < 12 or the command line tools corresponding to Xcode < 12". For completeness with other version checking variables this capability probably belongs in MacPorts base, but the same can be said for many of the portgroups.
> The benefit of this would be as follows. Let's say that we went to a system of having one portfile for each version of a particular package. Let's also say that "somepackage" was currently on version 2.4.3, and it is not compatible with macOS 11. If the upstream authors of "somepackage" were to add patches that made it compatible with macOS 11 and released it as 2.4.4, then the new portfile for 2.4.4 would increment darwin_version_compatibility <=20. However, the portfile for 2.4.3 wouldn't need to be changed at all, because it wasn't, and never will be, compatible with macOS 11. If a port maintainer were to come along and backport the macOS 11 patches into the 2.4.3 port, then they could update the 2.4.3 portfile to also be compatible with darwin <=20. A similar concept applies to Xcode version, ARM support, etc.
> Thus, the idea of keeping a separate portfile for each version of a package would answer the question of "how you could build historical "Portfiles" on a newer OS or compiler version", because the answer is: you never would. Once a "Portfile-2.4.3" was pegged to a maximum of Catalina, it would potentially never need to be updated again, because it is not compatible when macOS 11 gets released. Our new "port" command would be able to indicate to users on a macOS 11 machine that only "Portfile-2.4.4" was available, because the old historical "Portfile-2.4.3" isn't compatible with macOS 11.
>> This is all work that we are here volunteering to do to improve our current port collection, but asking everyone to retroactively also fix these problems in all older versions would not be acceptable. We don't have a sufficient number of contributors to keep our current collection of ports up to date, so we certainly don't have enough contributors to keep all previous versions of our collection working as well.
> This is why it would be important in a "new" MacPorts to be able to cap old versions of a portfile with a maximum compatible version of stuff. ("stuff" meaning dependencies, macOS version, Xcode version, etc.) This would allow older versions of portfiles to slowly fade into obsolescence as their maximum compatible versions of dependencies become more and more out-of-date. There would be no need to keep retroactively updating them once they've been capped with maximum compatible versions.
> If compatibility patches can be backported, then great, but if they can't be for any reason, then that's fine too, the old versions of the portfiles will continue to work for older versions of macOS/Xcode. But in my opinion that would be a much better situation than what exists now, where there's only a single Portfile (and a single version of the package) for all users on all versions of macOS. However, I once again reiterate that I also believe it would require a massive amount of effort to implement this new regime.
>> You might even have to compile each of those copies from source, since each copy is going to a different location, and MacPorts binaries are not designed to be relocatable. Making our binaries relocatable is something I'm interested in but that is also a large challenge.
> MacPorts binaries aren't designed to be relocatable, because MacPorts, in my opinion, operates under a very *nix-like philosophy of sharing and reusing resources. This is also why many macOS apps are relocatable: because they don't share resources with anything else, and thus can play in their own sandbox. Regardless of where you relocate this sandbox to, there is very little chance of the app breaking when you move it, because all of its toys (like the gettext discussed above) move along with it. In many cases, sharing/reusing resources and relocatability are direct trade-offs and in opposition to one another. (Obviously, things like @rpath, @executable_path, @loader_path, etc. were designed to help break this linkage/trade-off.)
All of this is true and none of it addresses the point I was making.
In this paragraph I was responding to the comparison to npm, a package manager using a hierarchical dependency scheme, or whatever you want to call it, and why it would not work well for us, as far as I can tell.
Let me try an example.
Suppose I have zlib installed at:
Now suppose I want libpng and it installs to:
It depends on zlib, and so as not to conflict with anyone else's zlib it installs its own copy:
Now I want to install gd2 and it installs to:
It depends on zlib so it installs:
It also depends on libpng so it installs:
We now have lots of extra disk usage on the user's system, lots of extra memory usage for loading all those redundant libraries, and we have problems regarding how did the above get installed: there are a few possibilities:
1. The user had to build everything from source because we don't have binaries. That's a step back from where we are now. Users like binaries.
2. We had to build each of those permutations of libpng, zlib, and everything else on our build server and distribute each of those specific binaries that are tied to those locations. That's a step back from where we are now in terms of wasting server build time and disk space.
3. We build just one libpng, zlib, etc. on our build server and we have to figure out how to make the packages relocatable. This is an interest of mine but we've done nothing yet to try to make this work. Note that just the task of identifying all Mach-O files and fixing the install_names and library linkage is easy; dylibbundler shows us it can be done. The hard part is figuring out where else the prefix might be baked in and how to solve it. Even MacPorts base itself is not relocatable; see https://trac.macports.org/ticket/56204 for why this can be a tougher problem than a simple find-replace in the destrooted files.
So I hope this npm-style hierarchy is not what anyone is proposing.
More information about the macports-dev