[MacPorts] #67336: BSD tar can create corrupted archives on Catalina, Big Sur, Monterey, Ventura
MacPorts
noreply at macports.org
Mon May 8 02:12:15 UTC 2023
#67336: BSD tar can create corrupted archives on Catalina, Big Sur, Monterey,
Ventura
---------------------+-------------------------------------------------
Reporter: catap | Owner: (none)
Type: defect | Status: new
Priority: Normal | Milestone:
Component: base | Version: 2.8.1
Resolution: | Keywords: catalina, bigsur, monterey, ventura
Port: |
---------------------+-------------------------------------------------
Comment (by ryandesign):
Replying to [comment:24 catap]:
> Unfortunately I'm able to reproduce it via Github CI. I not able to
reproduce it at any of my system. But the same port had the same issue on
the same macOS 13 on build bots.
Yes I know pari-2.13.4_0.darwin_22.x86_64.tbz2 experienced the problem on
the buildbot but none of the other builds of that version did, even though
we know this issue can also affect earlier OS versions, and no previous
pari versions were on the packages server, so we don't have enough data to
say whether this was a one-time random occurrence or something that
happens more regularly.
The Darwin 22 x86_64 buildbot worker is different from the others in that
it is running on a MacBook Pro with an external hard drive (until I can
figure out how to get it installed on one of the Xserves). Certainly we
know that APFS is not optimal on hard drives and perhaps this bug is more
common with hard drives. I don't know what hardware the GitHub Actions CI
runners use. However, the problem is not exclusive to hard drives; the two
Macs I saw the problem on with my mongodb builds were running off their
internal SSDs.
> I feel that this port / ability of reproduce the issue is good way for
us to track it and fix it.
Agreed, it is!
> Frankly speaking I don't like idea to flush caches. I feel that it
covers the issue.
Hard to say at this point since we don't know what the issue is. If the
issue is that there is a bug in the OS disk cache code such that it can
contain stale data, then flushing the cache on OS versions where that bug
exists is a reasonable workaround.
Several years ago Homebrew implemented the feature that formulas could opt
in to flushing the cache and they haven't removed that feature which
suggests that this workaround must be working sufficiently well.
Replying to [comment:25 catap]:
> Thus, run two tars haven't fix or overstep an issue.
Thanks, that's interesting!
The articles I read about sparse files said that the OS may create them
when it thinks it would save space, even if the program creating the file
didn't request that. And reading back a sparse file in the normal way
would deliver the data in the normal way, even it was stored sparsely on
disk. The program wouldn't ever need to know that a file had been sparse.
I assumed that tar (either BSD or GNU) would simply read the original
files, not care or know whether they were sparse, and write them to the
archive. However, after searching through the
[https://github.com/libarchive/libarchive/issues libarchive issue
tracker], I see that there have been bug reports about its handling of
sparse files before, so libarchive must contain code to identify sparse
files and handle them in a special way. Perhaps that code is buggy and we
should report the problem to libarchive. Or maybe they will have a better
idea about how to track down what's happening.
I've also now learned that
[https://www.gnu.org/software/tar/manual/html_node/sparse.html GNU tar has
support for special handling of sparse files] too. But whereas libarchive
seems to do this by default, GNU tar doesn't unless you use the `--sparse`
flag. You could test again with `${prefix}/bin/gtar -cvf - --sparse .
2>/dev/null | wc` and see if that also shows the problem.
--
Ticket URL: <https://trac.macports.org/ticket/67336#comment:26>
MacPorts <https://www.macports.org/>
Ports system for macOS
More information about the macports-tickets
mailing list