Re: The future of the Golang Portgroup ― what to do with offline builds?
Austin Ziegler
halostatue at gmail.com
Sat Feb 17 03:09:33 UTC 2024
On Fri, Feb 16, 2024 at 6:41 PM Aaron Madlon-Kay <amake at macports.org> wrote:
> I’m the author of go2port and the original author of the golang-1.0 port
> group.
>
We've interacted on GitHub, and your assistance was very helpful for
understanding quite a bit more. There's one particular failure mode with
go2port (I believe that it is the translation from google-apis URLs to
github that drops some subpaths) that I was looking at developing a fix
for, but ran into various issues before I ran out of time because of work
projects.
> The reasons for preferring “offline” builds have been discussed ad
> nauseam, but the discussions that I am aware of are dispersed across Trac,
> GitHub, the mailing list, etc., so it would probably be a good idea to put
> together a FAQ. Note that as I recall the idea of standing up a
> MacPorts-specific goproxy was raised and rejected.
>
A FAQ for this would definitely be beneficial.
> About go2port and the golang port group in particular:
>
> What you describe is *not* a failing of go.vendors per se; it *is* a
> somewhat common failure mode of go.vendors *as generated by go2port*.
>
I disagree. `go2port` is doing a fairly reasonable job, for all that the
resolution is naive. The go team makes it impossible for the module
resolution logic to be reused because all of that is in an `internal/`
directory, and it's spread across 3–6 different "modules" (I was *also*
looking at trying to enhance go2port with the actual golang module
resolution logic, but as noted, I ran out of time). I believe that
`go.vendors` handling is fundamentally broken after having spent several
hours with it (even with the patch I tried because it improves some of the
handling).
- it uses the GitHub legacy archive model, which is discouraged in the
github portgroup in favour of release or archives; this cannot be changed
without either adding a new flag (download_from archive) or updating all go
ports that use `go.vendors`.
- it has no way of handling subpaths within a larger project, which results
in *massive* overuse of bandwidth as I outlined in my last message.
I suspect that there are a few other issues, but the download and disk
space usage for source builds with this were insurmountable from my
perspective.
> go2port implements a naive subset of go’s actual dependency resolution
> logic. It works well for some subset of projects that have relatively
> simple dependencies. It falls down in some cases, notably when depending on
> subpaths within a larger project, and especially when requiring different
> versions of said subpaths.
> There are some ports where the shortcomings of go.vendors-by-go2port can
> be manually overcome without too much trouble. I maintain terraform-ls,
> which is one such port. It generates go.vendors about 80% correctly, and
> then I go in and do some manual fixes. It’s annoying, but much better than
> trying to write go.vendors from scratch, and still gives all the benefits
> of an offline build.
>
I agree. It may not sound like it, but I'm trying to figure out how this
situation can be fixed. Even when working with simple repos, the go.vendors
downloads are larger than they need to be (go get will not download
anything that is ancillary to the retrieved code, such as the README; the
github archive will contain the README and other folders not part of the
retrieved code).
> Then there are some ports where such manual fixups are too cumbersome, and
> the maintainer has chosen to forgo go.vendors and allow “online” builds.
> Please note that this *is* an option for you. No one is forcing you to use
> go.vendors (though MacPorts as a project does strongly prefer offline
> builds; so far my understanding is that it’s better to have an online-built
> port than no port at all).
>
In the example I used first, devel/please, there was a substantial amount
of effort to get it working in the first place (not by me), and as I was
trying to upgrade it (as it has a more-or-less mandatory two-phase,
partially online build)…I got frustrated with downloading the "same" 50Mb
archive when the actual dependency is 1/10th the size or less, which is why
I started this discussion. I *want* to use offline ports. I recently
contributed an expansion to `cargo2port` so that it could be used more
easily for rust ports that require more than one Cargo.lock file (rust
itself is one of these ports).
But I can't justify a 10x download volume when Go's module resolution
system provides the same basic protection that MacPorts offline builds do.
> So I feel like saying “don’t use go2port” or “don’t use go.vendors” is not
> a reasonable position. It works very well for some subset of go ports: I
> count 481 ports using the golang port group; of these I count 230 using
> `go.offline_build no` (I suspect a lot of these *could* be made to work
> offline, but the maintainers didn’t want to deal with it). Instead of
> saying “don’t use it”, I would suggest:
>
> - Help make the shortcomings more apparent, so people aren’t surprised
> like you were
- Help improve go2port to make the subset of nicely handled ports larger
>
go2port does need improvement, but as you say, it's good for single-repo
dependencies, even on high-dependency count repos like chezmoi (I haven't
looked at the chezmoi port, although as a maintainer of chezmoi I should
probably at least offer to keep that one up to date).
I don't think the major flaws are in go2port, I think they are in
go.vendors, because it uses a full repo archive download (and I get it:
that's all that GitHub *offers*).
> - Work with the golang people to make their module system more friendly
> for from-source package managers like MacPorts
>
That's extremely unlikely to happen, IMO. It took them 10-11 releases to
admit that they needed a module lock system (around 1.9/1.10) and another
three releases to stabilize it (1.11, 1.12, 1.13). The dependency system
has *always* been VCS-based (git, hg, etc.), and digging through
https://github.com/golang/go/blob/master/src/cmd/go/internal/modfetch/codehost/git.go
suggests that the go dependency resolution is starting from a bare git repo
and manipulating refs *manually* to get just the files needed starting from
the file (an earlier workflow that approximates the modern
`sparse-checkout`, I think).
There honestly feels like there should be a way for a maintainer to specify
something like `go.modvendor …` which a tool like `go2port` could generate
the block for with a `go mod vendor` in the checked out repo. What would be
in there…I don't know. But at a first pass, it might be a name
(dependency-version-vendor.tar.gz?) and a digest (`find vendor -type f
-print0 | xargs -0 dos2unix && tar cfz - vendor | sha256sum`?). The first
build machine would take that and *create* the reusable dependency archive
for storage, which every machine thereafter could download.
I’m riffing here, but there *has* to be a better approach, especially for
many small packages in large repos, than what we're doing now. Even if it
has a bootstrap problem.
-a
--
Austin Ziegler • halostatue at gmail.com • austin at halostatue.ca
http://www.halostatue.ca/ • http://twitter.com/halostatue
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.macports.org/pipermail/macports-dev/attachments/20240216/16582dca/attachment.htm>
More information about the macports-dev
mailing list