[MacPorts] #67336: BSD tar can create corrupted archives on Catalina, Big Sur, Monterey, Ventura

MacPorts noreply at macports.org
Mon May 8 02:12:15 UTC 2023


#67336: BSD tar can create corrupted archives on Catalina, Big Sur, Monterey,
Ventura
---------------------+-------------------------------------------------
  Reporter:  catap   |      Owner:  (none)
      Type:  defect  |     Status:  new
  Priority:  Normal  |  Milestone:
 Component:  base    |    Version:  2.8.1
Resolution:          |   Keywords:  catalina, bigsur, monterey, ventura
      Port:          |
---------------------+-------------------------------------------------

Comment (by ryandesign):

 Replying to [comment:24 catap]:
 > Unfortunately I'm able to reproduce it via Github CI. I not able to
 reproduce it at any of my system. But the same port had the same issue on
 the same macOS 13 on build bots.

 Yes I know pari-2.13.4_0.darwin_22.x86_64.tbz2 experienced the problem on
 the buildbot but none of the other builds of that version did, even though
 we know this issue can also affect earlier OS versions, and no previous
 pari versions were on the packages server, so we don't have enough data to
 say whether this was a one-time random occurrence or something that
 happens more regularly.

 The Darwin 22 x86_64 buildbot worker is different from the others in that
 it is running on a MacBook Pro with an external hard drive (until I can
 figure out how to get it installed on one of the Xserves). Certainly we
 know that APFS is not optimal on hard drives and perhaps this bug is more
 common with hard drives. I don't know what hardware the GitHub Actions CI
 runners use. However, the problem is not exclusive to hard drives; the two
 Macs I saw the problem on with my mongodb builds were running off their
 internal SSDs.

 > I feel that this port / ability of reproduce the issue is good way for
 us to track it and fix it.

 Agreed, it is!

 > Frankly speaking I don't like idea to flush caches. I feel that it
 covers the issue.

 Hard to say at this point since we don't know what the issue is. If the
 issue is that there is a bug in the OS disk cache code such that it can
 contain stale data, then flushing the cache on OS versions where that bug
 exists is a reasonable workaround.

 Several years ago Homebrew implemented the feature that formulas could opt
 in to flushing the cache and they haven't removed that feature which
 suggests that this workaround must be working sufficiently well.

 Replying to [comment:25 catap]:
 > Thus, run two tars haven't fix or overstep an issue.

 Thanks, that's interesting!

 The articles I read about sparse files said that the OS may create them
 when it thinks it would save space, even if the program creating the file
 didn't request that. And reading back a sparse file in the normal way
 would deliver the data in the normal way, even it was stored sparsely on
 disk. The program wouldn't ever need to know that a file had been sparse.
 I assumed that tar (either BSD or GNU) would simply read the original
 files, not care or know whether they were sparse, and write them to the
 archive. However, after searching through the
 [https://github.com/libarchive/libarchive/issues libarchive issue
 tracker], I see that there have been bug reports about its handling of
 sparse files before, so libarchive must contain code to identify sparse
 files and handle them in a special way. Perhaps that code is buggy and we
 should report the problem to libarchive. Or maybe they will have a better
 idea about how to track down what's happening.

 I've also now learned that
 [https://www.gnu.org/software/tar/manual/html_node/sparse.html GNU tar has
 support for special handling of sparse files] too. But whereas libarchive
 seems to do this by default, GNU tar doesn't unless you use the `--sparse`
 flag. You could test again with `${prefix}/bin/gtar -cvf - --sparse .
 2>/dev/null | wc` and see if that also shows the problem.

-- 
Ticket URL: <https://trac.macports.org/ticket/67336#comment:26>
MacPorts <https://www.macports.org/>
Ports system for macOS


More information about the macports-tickets mailing list